Open ghost opened 2 years ago
My experience has been that the argo components might have come up faster than the redis container, causing a bunch of problems with the cache structure. Usually killing all argocd components but redis works for me.
That said, I'm stilll looking for a way to properly fix this as I have a simillar issue on provisioning ArgoCD in ephemeral environments.
I did put a argocd app get --refresh appname >/dev/null
right before each and every argocd app
command (like argocd app delete
or argocd app wait
). This seems to help.
What puzzles me is that a missing cache entry can cause trouble. To me a cache is something that can vanish anytime, and missing cache entries should only slow things down (because you have to recreate the cached data from slower sources) but never make things fail. But maybe the wording of this error message is just misleading.
Yeah, I agree with you. The cache
in this case is more like a critical piece of infrastructure. It's odd it has been coded this way.
We're seeing the same error throughout the ArgoCD GUI and CLI. A number of basic functionalities are broken by this error, including previewing change diffs and running argocd app manifests
I faced a similar issue on ArgoCD v2.5.4, I've tried to change the Redis and restart all argoCD-related services, but it does not help. Is there any fix or permanent solution for this?
same here using argocd core, UI and cli
Yeah, I agree with you. The cache in this case is more like a critical piece of infrastructure. It's odd it has been coded this way.
I believe the intent has always been for everything to work even without Redis. But clearly something or some things were not coded according to that intent.
We also occasionally see this during cluster/node upgrades.
Below issues feel related:
It also looks like the log line changed.
In my case, this issue was happening because not all ArgoCD components were running the same image version. Some components had the :latest
tag instead of pinned ArgoCD Image version, and after fixing this everything came back to normal. Make sure all ArgoCD components run the exact same image version.
In my case, this issue was happening because not all ArgoCD components were running the same image version. Some components had the
:latest
tag instead of pinned ArgoCD Image version, and after fixing this everything came back to normal. Make sure all ArgoCD components run the exact same image version.
I have a feeling by changing the image version and bouncing the pods, that fixed the issue.
In my case, this issue was happening because not all ArgoCD components were running the same image version. Some components had the
:latest
tag instead of pinned ArgoCD Image version, and after fixing this everything came back to normal. Make sure all ArgoCD components run the exact same image version.
It works! Thank you my hero!
Describe the bug
I have a Gitlab pipeline running every night that ist using argocd to delete and recreate several resources (to reset an automatic test environment).
That pipeline fails in about 30-50% because of random resources giving
To Reproduce
What I do is
Sometimes one of the "app wait", sometimes even one of the "app delete-resource" at the beginning fails with "cache: key is missing". It is complete random.
I even added an
argocd app list -p testapps
right before theapp delete-resource
, and it is showing all the services 1-3 as Synced and Healthy, yet it fails when deleting that resource.Things I ruled out:
Expected behavior
apps listed as Synced and Healthy should not fail when you try to manually delete or refresh them a few seconds later.
Version
Logs
Server log shows nothing special. just repeating the argocd client error.