argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.73k stars 5.4k forks source link

HA Proxy with Default Rolling Update Strategy Not Compatible with 3 Nodes and Anti-affinity. #16627

Open JennyFigueroaMSR opened 10 months ago

JennyFigueroaMSR commented 10 months ago

The Kubernetes default rolling update strategy allows 25% of pods to become unavailable. 25% of the three replicas desired for the HA proxy is less than 1, so pods cannot be deleting during a rolling update.

We have 3 nodes on our cluster. When a rolling update occurred, the cluster tries to create a new pod. But it can't due to the anti-affinity rule and lack of nodes available. Thus, the cluster would have to delete a pod to make room. But it can't do that due to the default rolling update strategy disallowing unavailable pods. So, the deployment becomes degraded as it cannot continue its update.

The docs suggest at least 3 nodes. 4 would be needed to avoid this if the code is unchanged.

I propose either the docs be updated, or the ha proxy rolling update strategy be updated to allow 34% max unavailable pods.

Here's an article that helped explain the issue: https://medium.com/xmglobal/how-can-affinity-rules-break-your-kubernetes-rolling-updates-3055eb2d478c

https://github.com/argoproj/argo-cd/blob/0b35e2f1fe27f395e6106a7466d58911c4f7ec9c/manifests/ha/base/redis-ha/chart/upstream.yaml#L1046

hamps-contrib commented 10 months ago

We noticed this issue in our deployment as well! I had to scale deployment to 1 replica and scale up to 3 again to get "unstuck".