Open qoehliang opened 1 year ago
@qoehliang, perhaps you have encountered the same issue as us. Could you please check all the commit IDs in your history?
On our end, we have 80 applications generated by an ApplicationSet, and sometimes there is one or more applications that use a previous commit ID from about three weeks ago. We are uncertain whether this issue will be resolved by https://github.com/argoproj/argo-cd/pull/13452.
@YoranSys that PR fixes a bug that only currently exists in the master branch, not on any released version. What version are you running?
Hello @crenshaw-dev we use v2.7.3+e7891b8.dirty but we started to notice this problem on v2.6.4.
Do you use a CMP, like OP? It's possible there's a cache issue specific to CMPs.
@crenshaw-dev, I'm currently employing an ApplicationSet with the rollingSync option and Helm integration, where I store some values in S3 and other inside git (using github and master branch). I use a GitHub repository with the master branch as the generator (never see issue on the generator). Despite my efforts to clear the Redis cache multiple times, the problem continues to occur.
@YoranSys, thanks for sharing your observations. I haven't seen an application using an old commit, and the behaviour we have observed is not the state of an application going back several commits but more a ServiceAccount or ClusterRole getting pruned and then only being restarted after a manual "hard refresh" or a restart of the Argo CD pods.
We have 40 or so applications configured under 1 GitHub repository, but do not use ApplicationSets. We follow an app-of-apps approach. With that said, as @crenshaw-dev mentioned, we are using a Config Management Plugin (envsubst) via ConfigMap plugin approach. Not sure if that is where the culprit lies, as we have noticed that ConfigMap plugins are being deprecated.
The weird thing is, we have other GitHub repositories which are not impacted by this issue. There is another GitHub repository with only 10 or so applications deployed to the same EKS cluster, and it has been running perfectly. So scalability/size of the repository may also be playing a part in the issue. We are in the process of disabling automated pruning because of the inconsistency and unreliability of the feature in our clusters.
and Helm integration, where I store some values in S3 and other inside git
@YoranSys is that a CMP? afaik, Argo's built-in Helm support offers no way to communicate with S3?
via ConfigMap plugin approach. Not sure if that is where the culprit lies, as we have noticed that ConfigMap plugins are being deprecated.
@qoehliang they're deprecated, but they should still work fine. I suspect that some error in the CMP is causing it to return an empty manifest but not a non-zero exit code. So Argo CD is like "word, there are no resources, prune 'em all!"
Are you observing that all resources in the app are pruned, or just some? Because if it's just some, my theory is wrong.
We observe the same issue. We do not have auto prune enabled, but we consistently see some apps having all its resources marked as to-be-pruned.
Hi! We are experiencing a similar issue with ArgoCD v2.9.3. We've observed that it randomly prunes several applications (all resources), resulting in their states changing to 'Missing', and then attempts to redeploy each resource again after a certain period. As a workaround, we have generally disabled pruning.
Additionally, it's worth noting that we are utilizing the CMP Plugin argocd-vault-plugin
(custom sidecar image)
I see a similar issue here on argocd v2.8.4. The prunning here seems to happen only on Cluster-Scoped resources, and basically because argoCD queries them as being Namespaced instead of cluster scoped. This Causes ArgoCD to prune the 'namespaced' version to then reacreate the resource as a cluster-scoped version. I'm not sure my particular issue is actually an ArgoCD bug, but just wanted to add to the thread as it may help others.
Checklist:
argocd version
.Describe the bug
Anytime we merge a Pull Request to the master branch, Argo CD on one of many EKS clusters decides to prune an object that was not touched in GitHub (it is still present in GitHub repository).
To Reproduce
Deploy Application with Automated Pruning enabled, example below:
Merge a new commit to master branch.
Issue will happen in 1 or 2 of 40 EKS clusters we manage. For context, we use Argo CD to deploy a set of core services as we manage a platform for the company. I.E. We deploy the open source Cluster Autoscaler to all 40 EKS clusters using Argo CD.
This is happening almost anytime we make a push to the master branch of the repository that holds all of the Kubernetes manifests files for our services.
Expected behavior
Argo CD should perform an automated sync anytime we merge to master (that is currently happening as expected) but Argo CD should not see any changes and therefore not prune the resource.
Screenshots
We can see an automated sync succeeds and prunes the ServiceAccount of Cluster Autoscaler but nothing else.
Clicking into the Revision, we can see the PR that merged that triggered the automated sync:
You can see in the PR only 4 files changed, and it was for a dev cluster change and for the alertmanager-extras application which is nothing to do with Cluster Autoscaler.
The Cluster Autoscaler that got pruned was in a non-production environment, and uses the path mentioned in the application manifest above:
helmfile/rendered/non-prod-v1.23-monitor/cluster-autoscaler
. Checking the path you can see the last change was 23 days ago, not today which is when the issue happened.Performing another sync does not re-create the object. It feels like the cache has been made invalid. I can then do a hard refresh and the issue goes away until the next merge to the GitHub repository.
Performing a
regular Refresh
also doesn't do anything.Performing a
Hard Refresh
does bring the object back as you can see the ServiceAccount is a few seconds old compared to the Service which is 1 year old. I read that Hard Refresh invalidates the cache so it goes back to my concern that somehow after a Git commit to master, Argo CD is losing track of what objects should be deployed for what application. I dont suspect any connectivity problems, because I would have thought that the whole Application would be marked missing, and not just 1 object in an Application.Version
We are using Argo CD 2.5.5 and planning to upgrade to 2.6.6 shortly.
Logs
ca.log