grozan commented 1 year ago

Summary

you use sync waves when you want to order and control a deployment. The creation of a custom resource is commonly picked up by a controller running in the cluster, triggering the creation of other object(s) It should be possible to instruct Argocd to wait for those child objects to be in a healthy state before considering the wave to be completed (just as if those objects also had the same argocd.argoproj.io/sync-wave annotation of their parent)

Motivation

I want to deploy operators in a cluster. I want the deployment of the operator to be done in a first wave, and I want Argo to wait for the operator to be fully up and running before going to the next wave (during which the CR watched by this operator will be pushed)

For Openshift, this first wave typically contains 2 objects: an OperatorGroup and a Subscription. The creation of those objects triggers the creation of a ClusterServiceVersion object, which triggers the real installation. That ClusterServiceVersion object is not in the git repo of course, and Argocd does not wait for it to report a healthy state (its status.phase field should be Succeeded) before going to the next wave.

Even if the installation of the operator fails, Argo will still go to the next wave I simulated this with a custom health check always reporting a failure

spec:
  resourceHealthChecks:
    - group: operators.coreos.com
      kind: ClusterServiceVersion
      check: |
        hs = {}
        hs.status = "Degraded"
        hs.message = "always report a failed health status"
        return hs

and Argo still happily starts the wave 2. I would want wave 1 to be considered failed, and the whole sync to immediately stop and report a failure

Proposal

Make it optionable, at an object-level, to say "consider the child objects too, and wait for them to also be healthy before continuing"

How do you think this should be implemented?

I could imagine an annotation could be introduced for that. It could be something like argocd.argoproj.io/sync-wave-add-children: true, or something. Or if there is a question of timing, like "how long does argocd wait to see if a child object is created by another controller or not", then it could be something like argocd.argoproj.io/sync-wave-wait-for-children: 4 to wait up to 4 seconds (default value would be 0 of course, to mimic the current behaviour)

Wouter0100 commented 1 year ago

I would love to see this feature. We implemented our applications in ArgoCD, where a -1 wave is used to execute migrations prior actually upgrading the application itself. We expected ArgoCD to stop the sync if any of them turned Degraded, but this is apparently not the case.

Making it configurable like this, would be perfect. The naming "children" was a bit confusing to me, and that's why I did not find this issue right away. Other ways to call this would be, for example, "wait for healthy wave" (before continuing).

argocd.argoproj.io/sync-wave-wait-for-children with this label I'd also suggest to make -1 possible, where it'll be "unlimited".

wr-jc commented 12 months ago

I would also like to see this feature. I am currently migrating a number of helm charts to argo using app of apps pattern. It's been a challenge to apply sync-waves to the more complex charts. It would make things a lot easier if I could just apply a sync-wave to an app and have that propagate to all the child resources.

ScAmp3R commented 1 month ago

I would also like to see the feature. I found a workaround for that, for everyone who is interested: https://blog.stderr.at/openshift/2023/03/operator-installation-with-argo-cd/

argoproj / argo-cd

sync waves: should be possible to have child objects added to the wave #16133

Summary

Motivation

Proposal