ArgoCD Application stuck in state Progressing and Synced too long

jonaslar commented 3 years ago

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

[x ] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
[x ] I've included steps to reproduce the bug.
[x ] I've pasted the output of argocd version.

Describe the bug

ArgoCD Application stuck in state Progressing and Synced until refresh is issued either manually og after 3 minutes sync interval. When issue refresh, the Application is immediatly in state Healthy and Synced.

To Reproduce Update files kuberntes manifest files in repo ArgoCD Application is monitoring. Issue sync either manually or wait for auto sync.

Expected behavior

Application should end up in state Healthy and Synced as soon as possible and not wait for a refres after three minutes.

Screenshots

If applicable, add screenshots to help explain your problem.

Version

argocd-server: v1.7.7+33c93ae
  BuildDate: 2020-09-29T04:56:23Z
  GitCommit: 33c93aea0b9ee3d02fb9703cd82cecce3540e954
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.17.8.

Logs

argocd-application-controller logs:

time="2020-11-19T10:57:00Z" level=info msg="updated 'containerdemov1' operation (phase: Running)"
time="2020-11-19T10:57:00Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:6198e5c32ab87cd9e08e3a65a2598d50b6070eab,Prune:false,DryRun:false,SyncStrategy:&SyncStrategy{Apply:nil,Hook:&SyncStrategyHook{SyncStrategyApply:SyncStrategyApply{Force:false,},},},Resources:[]SyncOperationResource{},Source:nil,Manifests:[],SyncOptions:[],} { false} [] {0 nil}}" application=containerdemov1
time="2020-11-19T10:57:00Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T10:57:00Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=9 unmarshal_ms=9 version_ms=0
time="2020-11-19T10:57:00Z" level=info msg=syncing application=containerdemov1 skipHooks=false started=false syncId=00883-ajGMy
time="2020-11-19T10:57:00Z" level=info msg=tasks application=containerdemov1 syncId=00883-ajGMy tasks="[Sync/0 resource /Service:dev/containerdemov1 obj->obj (,,), Sync/0 resource apps/Deployment:dev/containerdemov1 obj->obj (,,), Sync/0 resource route.openshift.io/Route:dev/containerdemov1 obj->obj (,,), Sync/0 resource monitoring.coreos.com/ServiceMonitor:dev/containerdemov1 obj->obj (,,)]"
time="2020-11-19T10:57:00Z" level=info msg="Applying resource Service/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:01Z" level=info msg="Applying resource Deployment/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:02Z" level=info msg="Applying resource Route/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:03Z" level=info msg="Applying resource ServiceMonitor/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:04Z" level=info msg="Updating operation state. phase: Running -> Running, message: '' -> 'one or more tasks are running'" application=containerdemov1 syncId=00883-ajGMy
time="2020-11-19T10:57:04Z" level=info msg="Applying resource Service/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:04Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'service/containerdemov1 configured'" application=containerdemov1 kind=Service name=containerdemov1 namespace=dev phase=Sync syncId=00883-ajGMy
time="2020-11-19T10:57:04Z" level=info msg="Applying resource Deployment/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:05Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'deployment.apps/containerdemov1 configured'" application=containerdemov1 kind=Deployment name=containerdemov1 namespace=dev phase=Sync syncId=00883-ajGMy
time="2020-11-19T10:57:05Z" level=info msg="Applying resource Route/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:06Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'route.route.openshift.io/containerdemov1 unchanged'" application=containerdemov1 kind=Route name=containerdemov1 namespace=dev phase=Sync syncId=00883-ajGMy
time="2020-11-19T10:57:06Z" level=info msg="Applying resource ServiceMonitor/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:07Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'servicemonitor.monitoring.coreos.com/containerdemov1 unchanged'" application=containerdemov1 kind=ServiceMonitor name=containerdemov1 namespace=dev phase=Sync syncId=00883-ajGMy
time="2020-11-19T10:57:07Z" level=info msg="Updating operation state. phase: Running -> Succeeded, message: 'one or more tasks are running' -> 'successfully synced (all tasks run)'" application=containerdemov1 syncId=00883-ajGMy
time="2020-11-19T10:57:07Z" level=info msg="sync/terminate complete" application=containerdemov1 duration=7.390678196s syncId=00883-ajGMy
time="2020-11-19T10:57:07Z" level=info msg="updated 'containerdemov1' operation (phase: Succeeded)"
time="2020-11-19T10:57:07Z" level=info msg="Sync operation to 6198e5c32ab87cd9e08e3a65a2598d50b6070eab succeeded" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=OperationCompleted type=Normal
time="2020-11-19T10:57:07Z" level=info msg="Refreshing app status (controller refresh requested), level (2)" application=containerdemov1
time="2020-11-19T10:57:07Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T10:57:07Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=47 unmarshal_ms=47 version_ms=0
time="2020-11-19T10:57:07Z" level=info msg="Skipping auto-sync: application status is Synced" application=containerdemov1
time="2020-11-19T10:57:07Z" level=info msg="Update successful" application=containerdemov1
time="2020-11-19T10:57:07Z" level=info msg="Reconciliation completed" application=containerdemov1 dedup_ms=0 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" diff_ms=10 fields.level=2 git_ms=47 health_ms=1 live_ms=11 settings_ms=0 sync_ms=0 time_ms=103
time="2020-11-19T10:57:17Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=249 unmarshal_ms=249 version_ms=0
time="2020-11-19T10:57:17Z" level=info msg="Initiated automated sync to '3dd761556e6b39032bca0b5ffe46e35b22c56049'" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=OperationStarted type=Normal
time="2020-11-19T10:57:17Z" level=info msg="Initiated automated sync to '3dd761556e6b39032bca0b5ffe46e35b22c56049'" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Updated sync status: Synced -> OutOfSync" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-11-19T10:57:17Z" level=info msg="updated 'containerdemov1' operation (phase: Running)"
time="2020-11-19T10:57:17Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:3dd761556e6b39032bca0b5ffe46e35b22c56049,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{},Source:nil,Manifests:[],SyncOptions:[],} { true} [] {5 nil}}" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=7 unmarshal_ms=7 version_ms=0
time="2020-11-19T10:57:17Z" level=info msg="Update successful" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Reconciliation completed" application=containerdemov1 dedup_ms=0 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" diff_ms=6 fields.level=2 git_ms=249 health_ms=0 live_ms=12 settings_ms=0 sync_ms=0 time_ms=323
time="2020-11-19T10:57:17Z" level=info msg="Refreshing app status (normal refresh requested), level (2)" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg=syncing application=containerdemov1 skipHooks=false started=false syncId=00884-knAfF
time="2020-11-19T10:57:17Z" level=info msg=tasks application=containerdemov1 syncId=00884-knAfF tasks="[Sync/0 resource /Service:dev/containerdemov1 obj->obj (,,), Sync/0 resource apps/Deployment:dev/containerdemov1 obj->obj (,,), Sync/0 resource monitoring.coreos.com/ServiceMonitor:dev/containerdemov1 obj->obj (,,), Sync/0 resource route.openshift.io/Route:dev/containerdemov1 obj->obj (,,)]"
time="2020-11-19T10:57:17Z" level=info msg="Applying resource Service/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:17Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=60 unmarshal_ms=59 version_ms=0
time="2020-11-19T10:57:17Z" level=info msg="Skipping auto-sync: another operation is in progress" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Updated sync status: Synced -> OutOfSync" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-11-19T10:57:17Z" level=info msg="Update successful" application=containerdemov1
time="2020-11-19T10:57:17Z" level=info msg="Reconciliation completed" application=containerdemov1 dedup_ms=0 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" diff_ms=6 fields.level=2 git_ms=60 health_ms=1 live_ms=17 settings_ms=0 sync_ms=0 time_ms=116
time="2020-11-19T10:57:18Z" level=info msg="Applying resource Deployment/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:19Z" level=info msg="Applying resource ServiceMonitor/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:20Z" level=info msg="Applying resource Route/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:21Z" level=info msg="Updating operation state. phase: Running -> Running, message: '' -> 'one or more tasks are running'" application=containerdemov1 syncId=00884-knAfF
time="2020-11-19T10:57:21Z" level=info msg="Applying resource Service/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:22Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'service/containerdemov1 configured'" application=containerdemov1 kind=Service name=containerdemov1 namespace=dev phase=Sync syncId=00884-knAfF
time="2020-11-19T10:57:22Z" level=info msg="Applying resource Deployment/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:23Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'deployment.apps/containerdemov1 configured'" application=containerdemov1 kind=Deployment name=containerdemov1 namespace=dev phase=Sync syncId=00884-knAfF
time="2020-11-19T10:57:23Z" level=info msg="Applying resource ServiceMonitor/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:24Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'servicemonitor.monitoring.coreos.com/containerdemov1 unchanged'" application=containerdemov1 kind=ServiceMonitor name=containerdemov1 namespace=dev phase=Sync syncId=00884-knAfF
time="2020-11-19T10:57:24Z" level=info msg="Applying resource Route/containerdemov1 in cluster: https://172.30.0.1:443, namespace: dev"
time="2020-11-19T10:57:25Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'route.route.openshift.io/containerdemov1 unchanged'" application=containerdemov1 kind=Route name=containerdemov1 namespace=dev phase=Sync syncId=00884-knAfF
time="2020-11-19T10:57:25Z" level=info msg="Updating operation state. phase: Running -> Succeeded, message: 'one or more tasks are running' -> 'successfully synced (all tasks run)'" application=containerdemov1 syncId=00884-knAfF
time="2020-11-19T10:57:25Z" level=info msg="sync/terminate complete" application=containerdemov1 duration=7.355540882s syncId=00884-knAfF
time="2020-11-19T10:57:25Z" level=info msg="updated 'containerdemov1' operation (phase: Succeeded)"
time="2020-11-19T10:57:25Z" level=info msg="Sync operation to 3dd761556e6b39032bca0b5ffe46e35b22c56049 succeeded" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=OperationCompleted type=Normal
time="2020-11-19T10:57:25Z" level=info msg="Refreshing app status (controller refresh requested), level (2)" application=containerdemov1
time="2020-11-19T10:57:25Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T10:57:25Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=46 unmarshal_ms=46 version_ms=0
time="2020-11-19T10:57:25Z" level=info msg="Skipping auto-sync: application status is Synced" application=containerdemov1
time="2020-11-19T10:57:25Z" level=info msg="Updated sync status: OutOfSync -> Synced" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-11-19T10:57:25Z" level=info msg="Updated health status: Healthy -> Progressing" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-11-19T10:57:25Z" level=info msg="Update successful" application=containerdemov1
time="2020-11-19T10:57:25Z" level=info msg="Reconciliation completed" application=containerdemov1 dedup_ms=0 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" diff_ms=7 fields.level=2 git_ms=46 health_ms=1 live_ms=12 settings_ms=0 sync_ms=0 time_ms=102
time="2020-11-19T10:57:27Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=containerdemov1
time="2020-11-19T10:57:27Z" level=info msg="No status changes. Skipping patch" application=containerdemov1
time="2020-11-19T10:57:27Z" level=info msg="Reconciliation completed" application=containerdemov1 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=20
time="2020-11-19T10:57:28Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=containerdemov1
time="2020-11-19T10:57:28Z" level=info msg="No status changes. Skipping patch" application=containerdemov1
time="2020-11-19T10:57:28Z" level=info msg="Reconciliation completed" application=containerdemov1 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=18
time="2020-11-19T10:57:28Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=containerdemov1
time="2020-11-19T10:57:28Z" level=info msg="No status changes. Skipping patch" application=containerdemov1
time="2020-11-19T10:57:28Z" level=info msg="Reconciliation completed" application=containerdemov1 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=15
time="2020-11-19T10:57:28Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=containerdemov1
time="2020-11-19T10:57:28Z" level=info msg="No status changes. Skipping patch" application=containerdemov1
time="2020-11-19T10:57:28Z" level=info msg="Reconciliation completed" application=containerdemov1 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=14

time="2020-11-19T11:00:17Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=containerdemov1
time="2020-11-19T11:00:17Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: dev)" application=containerdemov1
time="2020-11-19T11:00:17Z" level=info msg="getRepoObjs stats" application=containerdemov1 build_options_ms=0 helm_ms=0 plugins_ms=0 repo_ms=0 time_ms=11 unmarshal_ms=11 version_ms=0
time="2020-11-19T11:00:18Z" level=info msg="Skipping auto-sync: application status is Synced" application=containerdemov1
time="2020-11-19T11:00:18Z" level=info msg="Updated health status: Progressing -> Healthy" application=containerdemov1 dest-namespace=dev dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-11-19T11:00:18Z" level=info msg="Update successful" application=containerdemov1
time="2020-11-19T11:00:18Z" level=info msg="Reconciliation completed" application=containerdemov1 dedup_ms=0 dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" diff_ms=12 fields.level=1 git_ms=11 health_ms=1 live_ms=21 settings_ms=0 sync_ms=0 time_ms=87
time="2020-11-19T11:

nightmareartist commented 3 years ago

Is there a recommended solution for this? I have tried lowering sync time and even running refresh manually but those don't work. I have multiple replicas of a service and each of them takes 3 minutes exactly to be marked as healthy. That means that previous version of the app doesn't get terminated until new one is marked as healthy.

jessesuen commented 3 years ago

To Reproduce Update files kuberntes manifest files in repo ArgoCD Application is monitoring. Issue sync either manually or wait for auto sync.

It's not clear there's a bug here. We cannot detect changes in the git repo unless either: (1) the 3 minute polling period was reached or (2) a webhook was configured to notify Argo CD about the change in git. This is expected behavior.

Am I missing something?

jessesuen commented 3 years ago

I think we need clear set of reproducible steps to do anything here.

nightmareartist commented 3 years ago

@jessesuen Thanks for responding. At least in our case this is what we see happening:

webhook triggers ArgoCD deploy process
service is updated and it becomes healthy in our cluster
service is registered as healthy in a load balancer
this is all over in <120 sec
ArgoCD waits for 180 sec before it declares the application healthy and terminates the old version

In our case we have an application that has 2 replicas and what we see is that the first replica is deployed, becomes healthy, is registered in a load balancer but termination of previous version is not triggered until ArgoCD declares it healthy(180sec). Then ArgoCD moves to the next replica.

What exactly is Argo looking for before it declares an application healthy?

no-response[bot] commented 3 years ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

tdnguyen6 commented 2 years ago

Is there any progress on this? This problem still occurs on my cluster on the latest stable version of argocd.

alter commented 2 years ago

+1

jeunii commented 2 years ago

Ive started to manage a secret. and the app is now stuck in Progressing. Any idea why ?

everythingshipjs commented 2 years ago

+1

cmorinupgrade commented 2 years ago

+1

jehutyy commented 2 years ago

+1

heesuk-ahn commented 2 years ago

+1

chris93111 commented 2 years ago

+1 when i try to add 2 new namespace managed (5 in total), cluster stuck in refresh with only 3 the argocd take 10m to sync the cluster and work again

kamuridesu commented 2 years ago

+1

herzogf commented 2 years ago

Can we re-open this issue? I'm seeing this in argocd 2.3.7 as well (sorry, we're on RH GitOps Operator thus a bit behind in terms of Argo version) when the app sync is triggered by a webhook (in our case by bitbucket server). The app syncs instantly and successfully, is synced but app health says "Progressing". The app is stuck in Progressing until either a) I manually do a "Refresh" in the UI or b) wait until the next automatic / scheduled refresh (i.e. ~ 3 minutes).

Small Update: I cannot reproduce this consistently :-( it happened several times in a row when first establishing the webhook and triggering it a few times. Then I added a webhook secret and the App went into healthy instantly after webhooks. Thinking it was related to this change I removed the secret again. Worked a few times (app instantly healthy), then went into constant "Progressing" again for a few syncs, and right now everything works again. I'm puzzled. Still, when the app is stuck in "Progressing" after the webhook, a simple "Refresh" or wait for the scheduled refresh fixes the app's status on our side.

kamuridesu commented 2 years ago

Vote for re-open. I've deployed keycloak via argocd but it's stuck at progressing, neither refresh or automatic refresh changes it.

markmcgookin commented 1 year ago

Also vote for a re-open.

I am seeing this too. We are using

ArgoCD v2.5.3+0c7de21, in Azure AKS, running K8S 1.23.12 on ubuntu 18.04 nodes.

I've got about 18 apps (deployments, secretproviders, services) across 3 different clusters all working fine, all are kustomize style deployments. But when I deploy an app defining our ingress, it just hangs with health progressing. This deployment existed on the cluster before argo (like most of the others) and I have tried directly deploying it manually (I have self-heal turned on) and changing settings to force a sync... nothing seems to get the health to update.

We have a webhook turned on, so every commit to that repo causes a sync pretty quickly... and it works for everything else.

The ONLY thing I can see being different is there is no patches or anything... my kustomization file just points to the ingress file and that's it. As there's so many differences with urls etc between environments.

resources:
  - ingress.yaml

More detailed argo versioning

argocd: v2.5.5+fc3eaec.dirty
  BuildDate: 2022-12-16T18:35:58Z
  GitCommit: fc3eaec6f498ddbe49a5fa9d215a219191fba02f
  GitTreeState: dirty
  GoVersion: go1.19.4
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.5.3+0c7de21

EDIT - This is still happening. New ingress. New Namespace. Same cluster. Webhook still enabled, Hard Refresh tried. App just hangs in 'progressing' even though ingress is in place and functional. Every other application works fine. Just any that define an ingress seem to just hang. What is Argo looking/waiting for here? There is no diff between desired and actual manifests.

emamihe commented 1 year ago

I also have the same issue in the version: v2.5.4+86b2dde Is it a bug?

markmcgookin commented 1 year ago

I may have (for my scenario) solved this...

I was really only seeing this when creating a set of ingress rules (I imagine this will apply if you have a larger app definition that has ingress in it)

I had some issues a while back with ingress not working properly, my work around was to create a service with a LoadBalancer and just point the DNS/Gateway routing to that External IP. BUT in one environment (that was set up fresh) the problem didn't exist which was strange. It was a cleaner install, so maybe that had something to do with it.

Anyway, when I specify a set of ingress rules, as I had only installed one ingress controller, K8S was supposed to recognise that fact, and if I defined ingress rules, it would figure out that there's only one ingressClass in play (in my case Nginx) and use that... and it seems to work in the new cluster, but the two older clusters didn't like it. They are all running the same version of K8S which ruled that out.

These old clusters had had a few things installed/uninstalled with Helm etc over the years, so that might have polluted the water.

What seems to have solved the problem for me in those old clusters is specifying the ingressClassName in the ingress rules yaml file's spec. This is telling K8S exactly what I want to do, and the exact ingress file that had been sitting hanging in "Progressing" for days on end, yet working ok, is now sitting as "healthy"

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-amazing-ingress-rules
  namespace: some-namespace
spec:
  ingressClassName: nginx
  rules:
    - host: "my-website-of-justice.com"
      http:

Thats a long winded answer, but hope it helps someone.

EDIT: I've tried this in a few places we were seeing this issue and it's working, so for my scenario, this is the fix.

andy108369 commented 1 year ago

In my case the issue got resolved once the nginx ingress RBAC bits were fixed. We had nginx ingress controller's --election-id changed. Hope that helps someone.

markmcgookin commented 1 year ago

I think the 'issue' here is not any one thing, its that there's a lack of visibility of what was causing Argo to hang in Progressing, or what it is looking for.

emamihe commented 1 year ago

dears, let me write something. my issue fixed actually but it wasn't related to argocd. I share my experience here. hope it become helpful for you when I wrote above that argocd is getting stuck in progressing status actually the issue wasn't related to the argocd. in my helm template I was setting up an ingress controller which even ingress was setting up and it was working but the trick behind was that my ingress controller operator wasn't going to update the status of ingress controller and put the load balancer ip in the status of it. so argocd was waiting that ingress status become updated. but the issue wasn't argocd. I eventually ended up by configuring the ingress controller operator correctly to update the status that from perspective of argocd it becomes ready. so that was the reason that argocd was got stuck in never ending loop of progressing. when you face this sitatuation open your app in dashboard and see argo is waiting for which resource, probably supervisor of that resource has some issue that argocd behave like this. However in new version of argocd they patched this bug (feature?) by turning those app got stuck in progressing status to suspended. but it won't fix the issue. issue is somewhere else.

toanbot commented 1 year ago

i got stuck at this too , vote we should re-open it

mmisztal1980 commented 1 year ago

experiencing this as well. any recommendations on how to proceed with diagnosis?

markmcgookin commented 1 year ago

experiencing this as well. any recommendations on how to proceed with diagnosis?

From what I have seen it's usually because you have declared an ingress and not specified a class. K8S understands that if you only have one ingress (for most people, seems to be nginx) it just uses that, but argo seems to want/need you to specify the class

spec:
  ingressClassName: nginx
  rules:
    - host: "somesite.com"

However, if you are not seeing this on an ingress, you would need to provide more information. @toanbot this is most likely your issue. Slightly more info in my answer above

nik123 commented 1 year ago

I experience the same problem with "Progressing" state for Ingress resources. I set "IngressClassName" value to the correct value and unfortunately it didn't help. I suspect in my environment the problem is caused by the lack of "IP Address". i.e. if I run kubectl get ingress my-ignress there is nothing in "IP Address" column. I guess in my environment a lack of "IP address" is fine, because my Ingress Controller uses NodePort instead of LoadBalancer to expose itself to the outer world.

Is a lack of "IP Address" really the case of a never ending "Progressing" state? Is there any additional configuration to set ArgoCD handle this scenario? My knowledge of ArgoCD and Ingress is not sophisticated enough to answer those questions.

markmcgookin commented 1 year ago

Is a lack of "IP Address" really the case of a never ending "Progressing" state? Is there any additional configuration to set ArgoCD handle this scenario? My knowledge of ArgoCD and Ingress is not sophisticated enough to answer those questions.

I wouldn't suspect that would be the case if it is set to NodePort, have you tried using a ClusterIP or a LoadBalancer even temporarily to see if that fixes it? What is the reason for using an ingress controller without an external IP?

valeriano-manassero commented 1 year ago

@markmcgookin I guess @nik123 is right, I have same exact problem with an Ingress Controller using NodePort. Application stays in Progressing forever (but it works correctly because Ingress rules are there).

If a switch Ingress Controller from NodePort to LoadBalancer, Ingress rule get an Address and application is Synced.

I think this one should be reopened.

AnhQKatalon commented 1 year ago

Vote for re-open. I've deployed keycloak via argocd but it's stuck at progressing, neither refresh or automatic refresh changes it.

I also having the same problem with keycloak deployment via ArgoCD. When we created a new image (with custom provider inside) and the image failed to run => the pod is degraded but the statefulset always stuck in processing.

Then we update our code, build a new image, update the new tag in values.yaml but the application still in processing and not update to use the new image tag

markmcgookin commented 1 year ago

Can you delete the deployment in Argo cd or via kubectl and then re sync?

TalhaNaeem101 commented 1 year ago

+1

caio-eiq commented 1 year ago

Can we reopen this issue? If not, what's the workaround? By setting the ingressClassName did not resolve the issue in my case either.

crenshaw-dev commented 1 year ago

Looks like folks have reported a variety of resources being stuck in Progressing.

Argo CD calculates health on a per-kind basis. And health checks may be customized per Argo CD installation. So each issue described here may be completely different.

I recommend that each person here who's still experiencing an app stuck Progressing open a new issue with this information the contents of the live resource manifest, including the status field (so, get it using kubectl) - obviously, redact any sensitive information. Please fill out the whole issue, especially including the Argo CD version.

That information should be enough for us to reproduce the issue and start working on a fix.

caio-eiq commented 1 year ago

@crenshaw-dev I've opened a new GH issue https://github.com/argoproj/argo-cd/issues/14607 for investigation. Thank you!

cerhades commented 11 months ago

i was able to fix this issue on my setup. i used the helm chart for installation and i had to enable publishedService. see the following snip from my helm values file. once i enabled that, argocd apps now show healthy.

providers:
  kubernetesCRD:
    allowCrossNamespace: true
    allowExternalNameServices: true
  kubernetesIngress:
    allowExternalNameServices: true
    publishedService:
      enabled: true

jbergler commented 11 months ago

If it helps anyone else, with traefik I found that I needed to set --providers.kubernetesingress.ingressendpoint.ip to some value for traefik to update the loadbalancer status on the ingress objects.

see https://github.com/traefik/traefik/issues/6303#issuecomment-584995779

myoung34 commented 10 months ago

I ran into this with the tailscale operator. It turns out I dirty deleted stuff in an app and the operator finalizer couldnt resolve it because it had failures around lingering pieces in a look up call

For tailscale specifically an app in ns foo with ingress foo was calling finalizer, bu tthere were multiple secrets created from the operator

ts-esphome-4hh2t-0                Opaque   9      12m
ts-esphome-bjlmc-0                Opaque   9      12d
ts-esphome-gdctw-0                Opaque   9      50m

causing the finalizer to stay in a broken loop and never letting the ingress finish

ASoldo commented 9 months ago

What happened in my case was this and this is how i fix it (minikube local cluster):

I was working with a service with type NodePort and then I switched to type LoadBalancer. From there I was still able to access and use application but was constantly stuck with progressing loop never finished. Then I found out about this issue and started googling around but no avail. Then i told gpt about this and it told me to use minikube tunnel. the moment i used minikube tunnel i got healthy status and now its all the same just like when I used NodePort.

I hope this helps.

rossigee commented 8 months ago

FWIW, the following chart values helped me to work around the inability to healthcheck the Haproxy Ingress Controller.

configs:
  cm:
    resource.customizations: |
      networking.k8s.io/Ingress:
        health.lua: |
          hs = {}
          hs.status = "Healthy"
          hs.message = "Probably just fine"
          return hs

chevdor commented 3 months ago

I tried a bunch of option. What worked for me was what is mentioned at https://github.com/traefik/traefik/issues/6303#issuecomment-584995779

which consists in adding and additionalArgument:

additionalArguments:
  - "--providers.kubernetesingress.ingressendpoint.ip=127.0.0.1"

As a result, k get ingress does report an ADDRESS and ArgoCD is happy.

argoproj / argo-cd

ArgoCD Application stuck in state Progressing and Synced too long #4863