Open shnigam2 opened 1 week ago
What's your argocd server version? What kind of sync are you talking about? If there are a lot of steps and for example Rollouts with long canaries, sync is expected to take a while. Does it get stuck?
Hello Andrii,
Our argocd version is: v2.5.1+504da42. Correct, there seems to be a quite a few steps including canaries one. But this seemed to work fine before. Currently it takes 10-15+ minutes and mostly gets stuck and all applications end up in "Unknown" status. Let us know if you need to check any additional logs, or set up a call. PS our argocd addons set up below: jimmy_daruwala@M-TFX2T4PV62 ~ % k get pods -A | grep argo argocd argocd-application-controller-0 1/1 Running 0 16h argocd argocd-application-controller-1 1/1 Running 0 16h argocd argocd-applicationset-controller-67fd897584-jhb7j 1/1 Running 0 16h argocd argocd-notifications-controller-d547c8d76-tzw27 1/1 Running 0 16h argocd argocd-redis-6cd966fffc-mcg9b 1/1 Running 0 16h argocd argocd-repo-server-55f974b986-bnz9s 1/1 Running 0 16h argocd argocd-repo-server-55f974b986-fb62h 1/1 Running 0 16h argocd argocd-repo-server-55f974b986-qbmx6 1/1 Running 0 16h argocd argocd-repo-server-55f974b986-qfbtf 1/1 Running 0 16h argocd argocd-server-574dd6b597-4xbq9 1/1 Running 0 16h argocd argocd-server-574dd6b597-54rkc 1/1 Running 0 16h argocd argocd-server-574dd6b597-px464 1/1 Running 0 16h argocd argocd-server-574dd6b597-vkbrk 1/1 Running 0 16h argocd container-secret-sync-28859175-vcdw5 0/1 Completed 0 40m argocd container-secret-sync-28859190-j6nrc 0/1 Completed 0 25m argocd container-secret-sync-28859205-bz2jr 0/1 Completed 0 10m
That might be a cli version. Do you have an output for argocd-server? Is it also old? If so, please, try upgrading.
Sure Andrii, PS the argocd-server version below: jimmy_daruwala@M-TFX2T4PV62 ~ % argocd version --client 2024/11/13 22:04:52 maxprocs: Leaving GOMAXPROCS=10: CPU quota undefined argocd: v2.13.0+347f221 BuildDate: 2024-11-04T15:30:50Z GitCommit: 347f221adba5599ef4d5f12ee572b2c17d01db4d GitTreeState: clean GoVersion: go1.23.2 Compiler: gc Platform: darwin/arm64
I see. Can you share the resources yamls you are syncing, please? In particular, if you have rollouts with long canaries.
Hi Andril. We are using argocd version 2.5.1 Please let me know what resources YAML files is required.
The Yamls which define application resources.
When trying to reproduce this issue, only thing I could find in the argocd-server pods logs was this error:
level=error msg="finished unary call with code Unknown" error="error getting cached app state: error getting application by query: application refresh deadline exceeded" grpc.code=Unknown grpc.method=ManagedResources grpc.service=application.ApplicationService grpc.start_time="2024-11-20T04:58:28Z" grpc.time_ms=60000.188 span.kind=server system=grpc 2024-11-20T00:00:17-05:00 time="2024-11-20T05:00:17Z" level=info msg="received unary call /application.ApplicationService/ResourceTree" grpc.method=ResourceTree grpc.request.claims="{\"aud\":\"argocd\",\"email\":\"Karthikeyan_Sekar@mckinsey.com\",\"exp\":1732079857,\"iat\":1732078657,\"iss\":\"https://prod-login-con01.intranet.mckinsey.com/auth/idp/k8sIdp\",\"jti\":\"T9qDumUGC2KvnFvEx6LnTw\",\"name\":\" Sekar\",\"nbf\":1732078537,\"nonce\":\"4ba21011-3b96-406d-9d05-2f9773942d5e\",\"preferred_username\":\"x-48-xx-48-xufx-54-xgyrgngx-56-xnkx-50-xvbx-51-xx-53-xx-54-x\",\"sub\":\"00uf6gyrgnG8nK2Vb356\"}" grpc.request.content="applicationName:\"my-app-converge-12126\" appNamespace:\"argocd\" " grpc.service=application.ApplicationService grpc.start_time="2024-11-20T05:00:17Z" span.kind=server system=grpc 2024-11-20T00:00:17-05:00 time="2024-11-20T05:00:17Z" level=info msg="Requested app 'my-app-converge-12126' refresh"
Not sure if it's related.
Can you try to enter the browser's developer mode and debug what's stalling the page? One case where I saw this is when an app had like 5k old jobs. I manually cleaned up those and things started to load well again.
How can we check on old (stale or stuck) jobs that do need cleaning up?
You can use kubectl and query by label of resources belonging to the app. Something like
kubectl get jobs -l app.kubernetes.io/instance=my-app
But not sure what exact annotation you have for tracking belonging to the app.
Posting the same above comments without Screenshots.
Hi Andrii, the application team unfortunately can not allow to post the yaml files here in the open forum. To put issues forward again: Issue 1 - App takes long time to SYNC. Especially “my-app-converge-12126”. Issue 2 - When we select “my-app-converge-12126" Application in Argo. The whole Argo UI gets glitchy and after few seconds it gives “Page Unresponsive” Message. This issue is pretty consistent. But for all other apps Argo UI runs fine and we do not see “Page Unresponsive” error. Another note: it is happening only on this Cluster. The resource usage for all the nodes as well as pods in question seems normal as well, that would tell us that this could not be a result of resource overload (For argocd and ns-converge-12126-prod both).
As also discussed, we found nothing valuable in Developer mode when reproducing the issue.
I will check on the jobs as you requested above.
I actually tried finding all the jobs in all NS's in the cluster and only got this: jimmy_daruwala@M-TFX2T4PV62 .aws % kubectl get jobs -A NAMESPACE NAME STATUS COMPLETIONS DURATION AGE argocd container-secret-sync-28868790 Complete 1/1 8s 40m argocd container-secret-sync-28868805 Complete 1/1 7s 25m argocd container-secret-sync-28868820 Complete 1/1 8s 10m argocd splunk-sync Complete 1/1 7s 96d namespace-operator-system snow-registration-28785600 Complete 1/1 8s 57d namespace-operator-system snow-registration-28787040 Complete 1/1 7s 56d namespace-operator-system snow-registration-28788480 Complete 1/1 9s 55d namespace-operator-system snow-registration-28867680 Failed 0/1 19h 19h openunison check-certs-openunison-28864920 Complete 1/1 10s 2d17h openunison check-certs-openunison-28866360 Complete 1/1 11s 41h openunison check-certs-openunison-28867800 Complete 1/1 10s 17h
Any next steps, or any other suggestions regarding this? Happy to set up a call between us and the App team as well.
Can you enable debug logs and see how much various steps take, e.g. search for "Reconciliation completed"?
Checklist:
argocd version
.Describe the bug
When we are simply syncing any apps it is taking lots of time. **To Reproduce** We just simply need to sync any app, it will take more than 20 mins for sure even if there is no change. **Expected behavior** In other environment it is getting synced in not more than 2 mins. **Screenshots** We just simply need to trigger the sync of any app, it is taking upto 1-1.5Hrs. **Version** ```shell argocd: v2.5.1+504da42 BuildDate: 2022-11-01T21:14:30Z GitCommit: 504da424c2c9bb91d7fb2ebf3ae72162e7a5a5be GitTreeState: clean GoVersion: go1.18.8 Compiler: gc Platform: linux/amd64 ``` **Logs** ``` Paste any relevant application logs here. ```