Open Sven1410 opened 11 months ago
I'm experiencing the same issue here!
adding the link to redis issue for reference
@sanzoghenzo please let me know if you found a workaround or solution :-) I already spent days with this problem - still no idea how to solve it
Unfortunately I didn't found a solution; I'm still experimenting inside a k3d environment so all it took was recreating the cluster...
I hope you'll find a solution! Cheers
I think I found a workaround:
I played a bit with the sentinel CLI and found out that a "sentinel reset *" always solves the problem and the CPU consumption is dropped down to normal. https://lzone.de/cheat-sheet/Redis%20Sentinel https://redis.io/docs/management/sentinel/#sentinel-commands
and so far the problem occurs only after a rolling update or pod restarts. So I added a delayed reset to all the sentinel containers via helm chart values:
argo-cd:
redis-ha:
sentinel:
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "sleep 30; redis-cli -p 26379 sentinel reset argocd "]
I'm still testing, but so far it looks good - no 100% cpu consumption anymore :-) .
I think I found a workaround:
I played a bit with the sentinel CLI and found out that a "sentinel reset *" always solves the problem and the CPU consumption is dropped down to normal. https://lzone.de/cheat-sheet/Redis%20Sentinel https://redis.io/docs/management/sentinel/#sentinel-commands
and so far the problem occurs only after a rolling update or pod restarts. So I added a delayed reset to all the sentinel containers via helm chart values:
argo-cd: redis-ha: sentinel: lifecycle: postStart: exec: command: ["/bin/sh", "-c", "sleep 30; redis-cli -p 26379 sentinel reset argocd "]
I'm still testing, but so far it looks good - no 100% cpu consumption anymore :-) .
Ran into the same issue over here. Attempted the same fix & confirmed that it's not an issue anymore. Still curious as to what the actual problem requiring a restart even is though.
Today, I encountered high CPU usage with two pods in the ArgoCD Redis HA setup: argocd-redis-ha-server-1 and argocd-redis-ha-server-2, both consuming close to 1 full CPU core each.
argocd-redis-ha-server-0 33m 40Mi
argocd-redis-ha-server-1 947m 41Mi
argocd-redis-ha-server-2 944m 42Mi
After running kubectl rollout restart sts argocd-redis-ha-server
, the resource consumption returned to normal levels.
One key observation is that the cluster nodes were upgraded/restarted around 30 hours prior, and the elevated CPU usage started after that event. The ArgoCD version at the time of this issue was:
{
"Version": "v2.12.3+6b9cd82",
"BuildDate": "2024-08-27T11:57:48Z",
"GitCommit": "6b9cd828c6e9807398869ad5ac44efd2c28422d6",
"GitTreeState": "clean",
"GoVersion": "go1.22.4",
"Compiler": "gc",
"Platform": "linux/amd64",
"KustomizeVersion": "v5.4.2 2024-05-22T15:19:38Z",
"HelmVersion": "v3.15.2+g1a500d5",
"KubectlVersion": "v0.29.6",
"JsonnetVersion": "v0.20.0"
}
Node information:
System Info:
Kernel Version: 6.1.100+
OS Image: Container-Optimized OS from Google
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.7.19
Kubelet Version: v1.28.13-gke.1119000
Kube-Proxy Version: v1.28.13-gke.1119000
Describe the bug
we recently updated to argocd 2.8.6 and use the redis-ha subchart from the argocd helm chart --> which ends up in redis "7.0.9-alpine3.17" (same problem in 7.0.14 and latest 7.2.3) the container "sentinel" of the "argocd-redis-ha-server-x" pod consumes 100% cpu (up to the allowed limit of 1000m) after restart. This occurs nearly after every restart - sometimes also for 2 of the 3 redis pods.
Version
the nodes run: