Closed jonaslar closed 3 years ago
Is there a recommended solution for this? I have tried lowering sync time and even running refresh manually but those don't work. I have multiple replicas of a service and each of them takes 3 minutes exactly to be marked as healthy. That means that previous version of the app doesn't get terminated until new one is marked as healthy.
To Reproduce Update files kuberntes manifest files in repo ArgoCD Application is monitoring. Issue sync either manually or wait for auto sync.
It's not clear there's a bug here. We cannot detect changes in the git repo unless either: (1) the 3 minute polling period was reached or (2) a webhook was configured to notify Argo CD about the change in git. This is expected behavior.
Am I missing something?
I think we need clear set of reproducible steps to do anything here.
@jessesuen Thanks for responding. At least in our case this is what we see happening:
In our case we have an application that has 2 replicas and what we see is that the first replica is deployed, becomes healthy, is registered in a load balancer but termination of previous version is not triggered until ArgoCD declares it healthy(180sec). Then ArgoCD moves to the next replica.
What exactly is Argo looking for before it declares an application healthy?
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.
Is there any progress on this? This problem still occurs on my cluster on the latest stable version of argocd
.
+1
Ive started to manage a secret. and the app is now stuck in Progressing. Any idea why ?
+1
+1
+1
+1
+1 when i try to add 2 new namespace managed (5 in total), cluster stuck in refresh with only 3 the argocd take 10m to sync the cluster and work again
+1
Can we re-open this issue? I'm seeing this in argocd 2.3.7 as well (sorry, we're on RH GitOps Operator thus a bit behind in terms of Argo version) when the app sync is triggered by a webhook (in our case by bitbucket server). The app syncs instantly and successfully, is synced but app health says "Progressing". The app is stuck in Progressing until either a) I manually do a "Refresh" in the UI or b) wait until the next automatic / scheduled refresh (i.e. ~ 3 minutes).
Small Update: I cannot reproduce this consistently :-( it happened several times in a row when first establishing the webhook and triggering it a few times. Then I added a webhook secret and the App went into healthy instantly after webhooks. Thinking it was related to this change I removed the secret again. Worked a few times (app instantly healthy), then went into constant "Progressing" again for a few syncs, and right now everything works again. I'm puzzled. Still, when the app is stuck in "Progressing" after the webhook, a simple "Refresh" or wait for the scheduled refresh fixes the app's status on our side.
Vote for re-open. I've deployed keycloak via argocd but it's stuck at progressing, neither refresh or automatic refresh changes it.
Also vote for a re-open.
I am seeing this too. We are using
ArgoCD v2.5.3+0c7de21, in Azure AKS, running K8S 1.23.12 on ubuntu 18.04 nodes.
I've got about 18 apps (deployments, secretproviders, services) across 3 different clusters all working fine, all are kustomize style deployments. But when I deploy an app defining our ingress, it just hangs with health progressing. This deployment existed on the cluster before argo (like most of the others) and I have tried directly deploying it manually (I have self-heal turned on) and changing settings to force a sync... nothing seems to get the health to update.
We have a webhook turned on, so every commit to that repo causes a sync pretty quickly... and it works for everything else.
The ONLY thing I can see being different is there is no patches or anything... my kustomization file just points to the ingress file and that's it. As there's so many differences with urls etc between environments.
resources:
- ingress.yaml
More detailed argo versioning
argocd: v2.5.5+fc3eaec.dirty
BuildDate: 2022-12-16T18:35:58Z
GitCommit: fc3eaec6f498ddbe49a5fa9d215a219191fba02f
GitTreeState: dirty
GoVersion: go1.19.4
Compiler: gc
Platform: darwin/arm64
argocd-server: v2.5.3+0c7de21
EDIT - This is still happening. New ingress. New Namespace. Same cluster. Webhook still enabled, Hard Refresh tried. App just hangs in 'progressing' even though ingress is in place and functional. Every other application works fine. Just any that define an ingress seem to just hang. What is Argo looking/waiting for here? There is no diff between desired and actual manifests.
I also have the same issue in the version: v2.5.4+86b2dde Is it a bug?
I may have (for my scenario) solved this...
I was really only seeing this when creating a set of ingress rules (I imagine this will apply if you have a larger app definition that has ingress in it)
I had some issues a while back with ingress not working properly, my work around was to create a service with a LoadBalancer and just point the DNS/Gateway routing to that External IP. BUT in one environment (that was set up fresh) the problem didn't exist which was strange. It was a cleaner install, so maybe that had something to do with it.
Anyway, when I specify a set of ingress rules, as I had only installed one ingress controller, K8S was supposed to recognise that fact, and if I defined ingress rules, it would figure out that there's only one ingressClass in play (in my case Nginx) and use that... and it seems to work in the new cluster, but the two older clusters didn't like it. They are all running the same version of K8S which ruled that out.
These old clusters had had a few things installed/uninstalled with Helm etc over the years, so that might have polluted the water.
What seems to have solved the problem for me in those old clusters is specifying the ingressClassName
in the ingress rules yaml file's spec. This is telling K8S exactly what I want to do, and the exact ingress file that had been sitting hanging in "Progressing" for days on end, yet working ok, is now sitting as "healthy"
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-amazing-ingress-rules
namespace: some-namespace
spec:
ingressClassName: nginx
rules:
- host: "my-website-of-justice.com"
http:
Thats a long winded answer, but hope it helps someone.
EDIT: I've tried this in a few places we were seeing this issue and it's working, so for my scenario, this is the fix.
In my case the issue got resolved once the nginx ingress RBAC bits were fixed.
We had nginx ingress controller's --election-id
changed.
Hope that helps someone.
I think the 'issue' here is not any one thing, its that there's a lack of visibility of what was causing Argo to hang in Progressing, or what it is looking for.
dears, let me write something. my issue fixed actually but it wasn't related to argocd. I share my experience here. hope it become helpful for you when I wrote above that argocd is getting stuck in progressing status actually the issue wasn't related to the argocd. in my helm template I was setting up an ingress controller which even ingress was setting up and it was working but the trick behind was that my ingress controller operator wasn't going to update the status of ingress controller and put the load balancer ip in the status of it. so argocd was waiting that ingress status become updated. but the issue wasn't argocd. I eventually ended up by configuring the ingress controller operator correctly to update the status that from perspective of argocd it becomes ready. so that was the reason that argocd was got stuck in never ending loop of progressing. when you face this sitatuation open your app in dashboard and see argo is waiting for which resource, probably supervisor of that resource has some issue that argocd behave like this. However in new version of argocd they patched this bug (feature?) by turning those app got stuck in progressing status to suspended. but it won't fix the issue. issue is somewhere else.
i got stuck at this too , vote we should re-open it
experiencing this as well. any recommendations on how to proceed with diagnosis?
experiencing this as well. any recommendations on how to proceed with diagnosis?
From what I have seen it's usually because you have declared an ingress and not specified a class. K8S understands that if you only have one ingress (for most people, seems to be nginx) it just uses that, but argo seems to want/need you to specify the class
spec:
ingressClassName: nginx
rules:
- host: "somesite.com"
However, if you are not seeing this on an ingress, you would need to provide more information. @toanbot this is most likely your issue. Slightly more info in my answer above
I experience the same problem with "Progressing" state for Ingress resources. I set "IngressClassName" value to the correct value and unfortunately it didn't help. I suspect in my environment the problem is caused by the lack of "IP Address". i.e. if I run kubectl get ingress my-ignress
there is nothing in "IP Address" column. I guess in my environment a lack of "IP address" is fine, because my Ingress Controller uses NodePort instead of LoadBalancer to expose itself to the outer world.
Is a lack of "IP Address" really the case of a never ending "Progressing" state? Is there any additional configuration to set ArgoCD handle this scenario? My knowledge of ArgoCD and Ingress is not sophisticated enough to answer those questions.
Is a lack of "IP Address" really the case of a never ending "Progressing" state? Is there any additional configuration to set ArgoCD handle this scenario? My knowledge of ArgoCD and Ingress is not sophisticated enough to answer those questions.
I wouldn't suspect that would be the case if it is set to NodePort, have you tried using a ClusterIP or a LoadBalancer even temporarily to see if that fixes it? What is the reason for using an ingress controller without an external IP?
@markmcgookin I guess @nik123 is right, I have same exact problem with an Ingress Controller using NodePort. Application stays in Progressing forever (but it works correctly because Ingress rules are there).
If a switch Ingress Controller from NodePort to LoadBalancer, Ingress rule get an Address and application is Synced.
I think this one should be reopened.
Vote for re-open. I've deployed keycloak via argocd but it's stuck at progressing, neither refresh or automatic refresh changes it.
I also having the same problem with keycloak deployment via ArgoCD. When we created a new image (with custom provider inside) and the image failed to run => the pod is degraded
but the statefulset
always stuck in processing.
Then we update our code, build a new image, update the new tag in values.yaml but the application still in processing and not update to use the new image tag
Can you delete the deployment in Argo cd or via kubectl and then re sync?
+1
Can we reopen this issue? If not, what's the workaround? By setting the ingressClassName did not resolve the issue in my case either.
Looks like folks have reported a variety of resources being stuck in Progressing.
Argo CD calculates health on a per-kind basis. And health checks may be customized per Argo CD installation. So each issue described here may be completely different.
I recommend that each person here who's still experiencing an app stuck Progressing open a new issue with this information the contents of the live resource manifest, including the status
field (so, get it using kubectl
) - obviously, redact any sensitive information. Please fill out the whole issue, especially including the Argo CD version.
That information should be enough for us to reproduce the issue and start working on a fix.
@crenshaw-dev I've opened a new GH issue https://github.com/argoproj/argo-cd/issues/14607 for investigation. Thank you!
i was able to fix this issue on my setup. i used the helm chart for installation and i had to enable publishedService. see the following snip from my helm values file. once i enabled that, argocd apps now show healthy.
providers:
kubernetesCRD:
allowCrossNamespace: true
allowExternalNameServices: true
kubernetesIngress:
allowExternalNameServices: true
publishedService:
enabled: true
If it helps anyone else, with traefik I found that I needed to set --providers.kubernetesingress.ingressendpoint.ip
to some value for traefik to update the loadbalancer status on the ingress objects.
see https://github.com/traefik/traefik/issues/6303#issuecomment-584995779
I ran into this with the tailscale operator. It turns out I dirty deleted stuff in an app and the operator finalizer couldnt resolve it because it had failures around lingering pieces in a look up call
For tailscale specifically an app in ns foo
with ingress foo
was calling finalizer, bu tthere were multiple secrets created from the operator
ts-esphome-4hh2t-0 Opaque 9 12m
ts-esphome-bjlmc-0 Opaque 9 12d
ts-esphome-gdctw-0 Opaque 9 50m
causing the finalizer to stay in a broken loop and never letting the ingress finish
What happened in my case was this and this is how i fix it (minikube local cluster):
I was working with a service with type NodePort
and then I switched to type LoadBalancer
. From there I was still able to access and use application but was constantly stuck with progressing loop never finished. Then I found out about this issue and started googling around but no avail. Then i told gpt about this and it told me to use minikube tunnel
. the moment i used minikube tunnel
i got healthy status and now its all the same just like when I used NodePort.
I hope this helps.
FWIW, the following chart values helped me to work around the inability to healthcheck the Haproxy Ingress Controller.
configs:
cm:
resource.customizations: |
networking.k8s.io/Ingress:
health.lua: |
hs = {}
hs.status = "Healthy"
hs.message = "Probably just fine"
return hs
I tried a bunch of option. What worked for me was what is mentioned at https://github.com/traefik/traefik/issues/6303#issuecomment-584995779
which consists in adding and additionalArgument:
additionalArguments:
- "--providers.kubernetesingress.ingressendpoint.ip=127.0.0.1"
As a result, k get ingress
does report an ADDRESS
and ArgoCD is happy.
If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.
Checklist:
argocd version
.Describe the bug
ArgoCD Application stuck in state Progressing and Synced until refresh is issued either manually og after 3 minutes sync interval. When issue refresh, the Application is immediatly in state Healthy and Synced.
To Reproduce Update files kuberntes manifest files in repo ArgoCD Application is monitoring. Issue sync either manually or wait for auto sync.
Expected behavior
Application should end up in state Healthy and Synced as soon as possible and not wait for a refres after three minutes.
Screenshots
If applicable, add screenshots to help explain your problem.
Version
Logs