argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.75k stars 865 forks source link

Recommend Ping Pong strategy as default for Canary with Traffic Routing using ALB. #2864

Open rajeshetty87 opened 1 year ago

rajeshetty87 commented 1 year ago

Checklist:

Describe the bug

Canary w/ Traffic Routing with AWS ALB Ingress causes availability issues at the end of rollout. The rollout progresses successfully through all steps and at the end, service selector labels are switched by Rollouts. This triggered two actions initiated by the ALB Ingress Controller

  1. Registers the targets from the Canary Target Groups to the Stable Target Groups.
  2. Updates the weights on the ALB listener from 0 --> 100% for Stable and 100 -> 0% for Canary

The above two actions don't necessarily happen in sync and in scenarios when listener updates (Action #2) before (Action #1) , ALB will send the traffic to an empty target group causing 503 errors and thus causing availability drop. The issues is mostly observed in swim-lanes with high TPS and lots of targets (>300) whereas low traffic swim-lanes don't show similar issue.

This is a known issue with Canary with Traffic routing and has been widely discussed in these threads(#2061, #1283 , #1453 ). The solution to resolve this issue was the Ping Pong feature and details are available here.

The request with this bugs is to make the Ping Pong strategy as a default when using Canary w/ Traffic Routing + AWS ALB over Simple Canary. This will save time for folks performing the switch and help them evaluate the correct solution(Ping Pong) instead of the Canary w/ Traffic Routing that does not provide Zero Downtime deployments.

To Reproduce

Steps to reproduce

Screenshots

Screenshot 2023-07-03 at 2 41 57 PM

Version

v1.5.1

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 60 days with no activity.