argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

fix(operator): allow retries to consider exit code from init container and don't consider node as pending if init failed. Fixes #11354/#10717/#10045 #13858

Open tooptoop4 opened 2 weeks ago

tooptoop4 commented 2 weeks ago

Fixes https://github.com/argoproj/argo-workflows/issues/11354 and https://github.com/argoproj/argo-workflows/issues/10717 and https://github.com/argoproj/argo-workflows/issues/10045

Before this fix it would always go into pending because main container was waiting state (https://github.com/argoproj/argo-workflows/blob/4742e9dd6ab2b797d32cb0953849fdcfe82ea325/workflow/controller/operator.go#L1404) even though init container already terminated with non-0 exit

This supersedes https://github.com/argoproj/argo-workflows/pull/13852

cc @terrytangyuan

jswxstw commented 2 weeks ago

Before this fix it would always go into pending because main container was waiting state even though init container already terminated with non-0 exit

This will only be encountered when using ContainerSet, right?

tooptoop4 commented 2 weeks ago

Before this fix it would always go into pending because main container was waiting state even though init container already terminated with non-0 exit

This will only be encountered when using ContainerSet, right?

no, see the logs/comments in the linked issues