argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18.05k stars 5.51k forks source link

Stuck applicationset progressive rollout #12202

Open billyshambrook opened 1 year ago

billyshambrook commented 1 year ago

Checklist:

Describe the bug

A progressive rollout enabled ApplicationSet sometimes get's stuck between rollout steps.

Not sure if this is related, but I have noticed that the applicationset conditions seem to continuously flip between the following, seems like the controller does not append these but keeps overwriting itself:

...
  - lastTransitionTime: '2023-01-29T22:35:24Z'
    message: Successfully generated parameters for all Applications
    reason: ParametersGenerated
    status: 'True'
    type: ParametersGenerated
...
...
  - lastTransitionTime: '2023-01-29T22:34:44Z'
    message: ApplicationSet Rollout Rollout started
    reason: ErrorOccurred
    status: 'False'
    type: ParametersGenerated
...

After inspecting the code, it seems to be caused by the controller calling r.setApplicationSetStatusCondition here https://github.com/argoproj/argo-cd/blob/master/applicationset/controllers/applicationset_controller.go#L1154 with the paramtersGenerated argument set to false and then the controller within the same reconcile calls the function again here (maybe) https://github.com/argoproj/argo-cd/blob/master/applicationset/controllers/applicationset_controller.go#L277 with the parametersGenerated set to true which overwrites the progressive condition.

To Reproduce

Apply this applicationset.yaml. If the applicationset rollout works, try deleting the applicationset and re-applying it a few times.

Expected behavior

All applications rollout successfully.

Screenshots

Screenshot 2023-01-29 at 2 26 39 PM

Version

argocd: v2.6.0-rc5+e790028
  BuildDate: 2023-01-25T17:57:49Z
  GitCommit: e790028e5cf99d65d6896830fc4ca757c91ce0d5
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/amd64
thober35 commented 1 year ago

We also observed a similar issue. After debugging the code we found a possible culprit here: https://github.com/argoproj/argo-cd/blob/master/applicationset/controllers/applicationset_controller.go#L1023 App Status is stuck in "pending" even though the sync was successful. The operationPhaseString is "Succeeded" therefore an update to the status is never performed. Possible fix there would be to check for (operationPhaseString == "Succeeded" && !appOutdated). Don't know if this relates to your issue as we did not check the ApplicationSetStatus.

@crenshaw-dev please add label appset/progressive-rollouts. Thanks.

riuvshyn commented 1 year ago

same happens on 2.7.1

mike-serchenia commented 1 year ago

Same on 2.8

bhutkovskyysos commented 1 year ago

any updates on this issue?

vitaly-dt commented 8 months ago

Hi - does anyone have any insights on this one?

grosenba commented 8 months ago

I can only say that I still have the problem with 2.10.4.

thomaspetit commented 8 months ago

Has anyone found a workaround for this one?

I notice this too:

- lastTransitionTime: "2024-03-22T13:01:35Z"
    message: Successfully generated parameters for all Applications
    reason: ApplicationSetUpToDate
    status: "False"
    type: ErrorOccurred

Meanwhile these errors pop-up in the appset controller:

time="2024-03-22T13:07:17Z" level=error msg="unable to set application set status: Operation cannot be fulfilled on applicationsets.argoproj.io \"argocd\": the object has been modified; please apply your changes to the latest version and try again" applicationset=argocd/argocd

The latter seems unrelated to the initial issue logged here but it is interesting to see that the progressive rollout also has issues with ArgoCD being managed by the progressive rollout.

Qwiko commented 8 months ago

I also have this issue on 2.10.4

gmauleon commented 2 months ago

I believe @carlosrejano added some retries that might fix that in https://github.com/argoproj/argo-cd/issues/19535.

What worries me is that from my outsider perspective, an application status in the applicationset should always be able to move forward based on the real application statuses, it should not get stuck in a state based on its stored status.

It seems like some re-work is needed of the pseudo state machine but I have a hard time to grasp all the logic šŸ˜…

gmauleon commented 2 months ago

Allright after some more digging I believe the logic I was talking about was rewritten in https://github.com/argoproj/argo-cd/pull/17296 released in v2.12. There is also a reconciliation issue in v2.12 that will be cherry picked and released soon, I believe, via https://github.com/argoproj/argo-cd/pull/19995.

I'm currently running a custom version with the cherry pick and it seems to work so far. The only caveat is that the applications that were stuck in "Pending" were not moved automatically, I had to patch the status to "Progressing" myself but after that, no more problems appeared.

There still seems to be potential "stuck in pending" problems mentioned in https://github.com/argoproj/argo-cd/issues/19535 though with consecutive commits during a rolling sync.