Open lgob0 opened 1 year ago
I will use this issue as the main one for my investigation.
I was able to reproduce and I experienced the same problem, mainly with Deployment. Although, based on my findings, I believe that it is caused by a sequence of events that are not specific to the "Deployment" kind and might affect other resources.
In the Deployment scenario, it would seem that it only happens on Pod scale down. To reproduce, I used kubectl to scale down the deployment argocd-repo-server
.
The logs below seem to show that
ctrl.appInformer.GetIndexer().GetByKey(appKey.(string))
to get the Application, but it returns version 239567632. It should return 239568263Hi, we are experiencing this issue. Can you share any updates on this please?
Checklist:
Describe the bug
The Application CRD reconciliation stuck in the progressing state waiting for a Deployment resource to be ready despite it already is. The health message says:
At the same time the Deployment resource is ready and both metadata.generation and status.observedGeneration are equal. From our observations this issue affects up to 11% of our daily deployments and a single occurrence may take ArgoCD up to 5 minutes to realize that the Deployment resource is ready. We observed the issue only on created Application resources, not on the updated ones.
To Reproduce
There is no simple way to easy reproduce a single occurrence of this issue. We find this behavior as completely random. In our case with ~300 deployments a day there is always up to few dozens affected.
Expected behavior
The Application CRD resource is ready up to a few seconds after every deployed resource is ready, including the described case.
Version
v2.5.5 from the helm chart version 5.16.14
Example
Resources caught kubectl get during one incident.
application.yaml.txt deployment.yaml.txt