I had a network problem to the kubernetes API (flaps), so argoCD applications got a timeout when trying to sync.
This led to constant changes in the status of the app, and I hadt housands of events like this:
kubectl get events -n argocd
...
45h Normal ResourceUpdated application/some-app Updated health status: Healthy -> Missing
45h Normal ResourceUpdated application/some-app Updated sync status: OutOfSync -> Unknown
45h Normal ResourceUpdated application/some-app Updated health status: Healthy -> Missing
45h Normal ResourceUpdated application/some-app Updated sync status: Unknown -> OutOfSync
45h Normal ResourceUpdated application/some-app Updated health status: Missing -> Healthy
45h Normal ResourceUpdated application/some-app Updated sync status: Unknown -> OutOfSync
45h Normal ResourceUpdated application/some-app Updated health status: Missing -> Healthy
45h Normal ResourceUpdated application/some-app Updated sync status: OutOfSync -> Unknown
...
I have like 200+ apps in my argoCD so it make scale and this leads to grow my etcd to 600MB+ in couple days and continued to grow.
I've made etcd snapshot and I checked where this data increase comes from. Because I have dedicated k8s cluster for argo it was easy to tell that issue is with argo. After inspecting etcd
To Reproduce
make network related issue so k8s API is flapping
argo will try to sync apps every 3min (default)
monitor etcd size
Expected behavior
argoCD should cleanup events resource because it can easily generate thousands of them
Checklist:
argocd version
.Describe the bug
I had a network problem to the kubernetes API (flaps), so argoCD applications got a timeout when trying to sync. This led to constant changes in the status of the app, and I hadt housands of events like this:
I have like 200+ apps in my argoCD so it make scale and this leads to grow my etcd to 600MB+ in couple days and continued to grow. I've made etcd snapshot and I checked where this data increase comes from. Because I have dedicated k8s cluster for argo it was easy to tell that issue is with argo. After inspecting etcd
To Reproduce
Expected behavior
argoCD should cleanup events resource because it can easily generate thousands of them
Workaround As a workaround to restore etcd space:
kubectl delete events -n argocd --all -v10 --grace-period 0 --force
Screenshots ETCD database size increase over time and decrease when I start cleaning up events
Version
Logs