argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.74k stars 5.41k forks source link

Resource of type Deployment stuck in the Progressing state #14266

Open lgob0 opened 1 year ago

lgob0 commented 1 year ago

Checklist:

Describe the bug

The Application CRD reconciliation stuck in the progressing state waiting for a Deployment resource to be ready despite it already is. The health message says:

Waiting for rollout to finish: observed deployment generation less than desired generation

At the same time the Deployment resource is ready and both metadata.generation and status.observedGeneration are equal. From our observations this issue affects up to 11% of our daily deployments and a single occurrence may take ArgoCD up to 5 minutes to realize that the Deployment resource is ready. We observed the issue only on created Application resources, not on the updated ones.

To Reproduce

There is no simple way to easy reproduce a single occurrence of this issue. We find this behavior as completely random. In our case with ~300 deployments a day there is always up to few dozens affected.

Expected behavior

The Application CRD resource is ready up to a few seconds after every deployed resource is ready, including the described case.

Version

v2.5.5 from the helm chart version 5.16.14

Example

Resources caught kubectl get during one incident.

application.yaml.txt deployment.yaml.txt

agaudreault commented 1 year ago

I will use this issue as the main one for my investigation.

I was able to reproduce and I experienced the same problem, mainly with Deployment. Although, based on my findings, I believe that it is caused by a sequence of events that are not specific to the "Deployment" kind and might affect other resources.

In the Deployment scenario, it would seem that it only happens on Pod scale down. To reproduce, I used kubectl to scale down the deployment argocd-repo-server.

The logs below seem to show that

Logs _time | level | application | kind | namespace | name | msg -- | -- | -- | -- | -- | -- | -- 2023-08-08 12:35:36.506 | info | argocd/argo-cd |   |   |   | No status changes. Skipping patch 2023-08-08 12:35:36.506 | info | argocd/argo-cd |   |   |   | Reconciliation completed 2023-08-08 12:35:36.480 | debug |   | PodDisruptionBudget | argocd | argocd-repo-server | Ignoring change of object because none of the watched resource fields have changed 2023-08-08 12:35:36.477 | info | argocd/argo-cd |   |   |   | Current App status before reconcile is Progressing. Version 239568263 2023-08-08 12:35:36.477 | info | argocd/argo-cd |   |   |   | Refreshing app status (controller refresh requested), level (0) 2023-08-08 12:35:36.477 | info | argocd/argo-cd |   |   |   | No status changes. Skipping patch 2023-08-08 12:35:36.477 | info | argocd/argo-cd |   |   |   | Reconciliation completed 2023-08-08 12:35:36.458 | debug | argocd/argo-cd | Pod | argocd | argocd-repo-server-585f975774-42nzs | Requesting app refresh caused by object update 2023-08-08 12:35:36.448 | info | argocd/argo-cd |   |   |   | Current App status before reconcile is Progressing. Version 239568263 2023-08-08 12:35:36.448 | info | argocd/argo-cd |   |   |   | Refreshing app status (controller refresh requested), level (0) 2023-08-08 12:35:36.448 | info | argocd/argo-cd |   |   |   | Reconciliation completed 2023-08-08 12:35:36.448 | info | argocd/argo-cd |   |   |   | No status changes. Skipping patch 2023-08-08 12:35:36.443 | debug | argocd/argo-cd | Pod | argocd | argocd-repo-server-585f975774-42nzs | Requesting app refresh caused by object update 2023-08-08 12:35:36.420 | info | argocd/argo-cd |   |   |   | Current App status before reconcile is Progressing. Version 239568263 2023-08-08 12:35:36.420 | info | argocd/argo-cd |   |   |   | Refreshing app status (controller refresh requested), level (0) 2023-08-08 12:35:36.420 | debug | argocd/argo-cd | Pod | argocd | argocd-repo-server-585f975774-42nzs | Requesting app refresh caused by object update 2023-08-08 12:35:30.642 | info | argocd/argo-cd |   |   |   | No status changes. Skipping patch 2023-08-08 12:35:30.642 | info | argocd/argo-cd |   |   |   | Reconciliation completed 2023-08-08 12:35:30.611 | debug | argocd/argo-cd | Pod | argocd | argocd-repo-server-585f975774-42nzs | Requesting app refresh caused by object update 2023-08-08 12:35:30.611 | info | argocd/argo-cd |   |   |   | Current App status before reconcile is Progressing. Version 239568263 2023-08-08 12:35:30.611 | info | argocd/argo-cd |   |   |   | Refreshing app status (controller refresh requested), level (0) 2023-08-08 12:35:05.428 | info | argocd/argo-cd |   |   |   | No status changes. Skipping patch 2023-08-08 12:35:05.428 | info | argocd/argo-cd |   |   |   | Reconciliation completed 2023-08-08 12:35:05.425 | info | argocd/argo-cd |   |   |   | Skipping auto-sync: application status is Synced 2023-08-08 12:35:05.049 | debug | argocd/argo-cd |   |   |   | Retrieved live manifests 2023-08-08 12:35:05.046 | info | argocd/argo-cd |   |   |   | getRepoObjs stats 2023-08-08 12:35:05.036 | debug |   | Application | argocd | argo-cd | Ignoring change of object because none of the watched resource fields have changed 2023-08-08 12:35:05.032 | debug |   |   |   |   | Generating Manifest for source [argo-cd] revision 4136532b04353dd6c645108f09a0fc718391a19b 2023-08-08 12:35:05.031 | info | argocd/argo-cd |   |   |   | Reconciliation completed 2023-08-08 12:35:05.031 | info |   |   |   |   | Ignore status for all objects 2023-08-08 12:35:05.031 | info |   |   |   |   | Current App status before reconcile is Healthy. Version 239567632 2023-08-08 12:35:05.031 | info | argocd/argo-cd |   |   |   | Refreshing app status (controller refresh requested), level (1) 2023-08-08 12:35:05.031 | info | argocd/argo-cd |   |   |   | Comparing app state (cluster: https://kubernetes.default.svc, namespace: argocd) 2023-08-08 12:35:05.031 | info | argocd/argo-cd |   |   |   | Update successful. New version is 239568263 2023-08-08 12:35:04.931 | info | argocd/argo-cd |   |   |   | Skipping auto-sync: application status is Synced 2023-08-08 12:35:04.931 | info | argo-cd |   |   |   | Updated health status: Healthy -> Progressing 2023-08-08 12:35:04.683 | debug |   | Deployment | argocd | argocd-repo-server | Ignoring change of object because none of the watched resource fields have changed 2023-08-08 12:35:04.658 | debug |   | ReplicaSet | argocd | argocd-repo-server-585f975774 | Ignoring change of object because none of the watched resource fields have changed 2023-08-08 12:35:04.635 | debug | argocd/argo-cd | ReplicaSet | argocd | argocd-repo-server-585f975774 | Requesting app refresh caused by object update 2023-08-08 12:35:04.629 | debug |   | PodDisruptionBudget | argocd | argocd-repo-server | Ignoring change of object because none of the watched resource fields have changed 2023-08-08 12:35:04.604 | debug | argocd/argo-cd | Pod | argocd | argocd-repo-server-585f975774-42nzs | Requesting app refresh caused by object update 2023-08-08 12:35:04.603 | debug | argocd/argo-cd | Deployment | argocd | argocd-repo-server | Requesting app refresh caused by object update 2023-08-08 12:35:04.577 | debug | argocd/argo-cd | ReplicaSet | argocd | argocd-repo-server-585f975774 | Requesting app refresh caused by object update 2023-08-08 12:35:04.562 | debug | argocd/argo-cd |   |   |   | Retrieved live manifests 2023-08-08 12:35:04.559 | info | argocd/argo-cd |   |   |   | getRepoObjs stats 2023-08-08 12:35:04.544 | info | argocd/argo-cd |   |   |   | Comparing app state (cluster: https://kubernetes.default.svc, namespace: argocd) 2023-08-08 12:35:04.544 | info | argocd/argo-cd |   |   |   | Current App status before reconcile is Healthy. Version 239567632 2023-08-08 12:35:04.544 | info | argocd/argo-cd |   |   |   | Refreshing app status (controller refresh requested), level (1) 2023-08-08 12:35:04.544 | info |   |   |   |   | Ignore status for all objects 2023-08-08 12:35:04.544 | debug |   |   |   |   | Generating Manifest for source [argo-cd] revision 4136532b04353dd6c645108f09a0fc718391a19b 2023-08-08 12:35:04.543 | debug | argocd/argo-cd | Deployment | argocd | argocd-repo-server | Requesting app refresh caused by object update
nabeelaccount commented 1 year ago

Hi, we are experiencing this issue. Can you share any updates on this please?