argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.72k stars 848 forks source link

Feature Request: deploy baseline and canary pods for rollout analysis #532

Open jzyeezy opened 4 years ago

jzyeezy commented 4 years ago

My organization has applications that require a "warm-up" period where metric spikes observed during that time may be false positives; e.g. building up an nginx cache. We have seen recommendations to compare canaries against an additional baseline deployment brought up at the same time, rather than against the existing production deployment:

We've played around with the Experiments CRD, which seems the closest towards satisfying our feature request - however we're observing that ingress-nginx is not able to route traffic to the Experiment replica-sets.

Are there any future plans for Argo Rollouts marrying the features between Canary Rollout and Experiments - in that a rollout can be configured to send a percentage of production traffic to its deployed baseline AND canary replicas?

jessesuen commented 4 years ago

however we're observing that ingress-nginx is not able to route traffic to the Experiment replica-sets.

This should already be possible. The experiment simply needs to share the same labels of the rollout. See the experiment docs on how to do this:

https://argoproj.github.io/argo-rollouts/features/experiment/#integration-with-rollouts

jzyeezy commented 4 years ago

@jessesuen thanks for getting back to me.

We set up our experiment to share the same labels of the rollout, as well as tried configuring specific selectors on the rollout labels. In doing this, we observed that the rollout-pod-template-hash that would auto-populate in the experiment pods did not match the rollout replicasets. Additionally, the Service objects, that are modified by argo, have the rollout-pod-template-hash of the canaries added to them, which doesn't match the hash of the experiment pods (*see note below). This mismatch is the part that makes it impossible for us to have a service/ingress that can target both canary and experiment replicasets.

*To add extra detail: when testing experiments with ingress-nginx, we noticed that argo would always spin up an extra canary pod, in addition to the experiment's control and canary replicas. E.g. we set up the experiment to have 2 replicas for both the baseline and canary, expecting to see 4 pods (2 baseline, 2 canary); we would observe 5 pods (2 baseline, 3 canary) spun up during a rollout. The extra canary pod is what ingress-nginx would send traffic to, despite our efforts to use the same rollout labels on the experiment.

mrak commented 3 years ago

The fundamental issue here is if you want to use experiments with traffic management such as the NGINX ingress controller it is impossible to have one Service that can send traffic to the stable and experiment pods.

This is due to canaryService and stableService being required parameters for traffic management (at least for NGINX). As soon as you set these options the argo-rollouts controller will modify the selectors of those services to target only the rollout-pod-template-hash of the canary or stable pods. The experiment pods have their own rollout hash and thus are not set as endpoints for the services.

If you are not using traffic management with ingress controllers and rely only on pod replica ratios for traffic balancing, you can leave off the canaryService and stableService and use your own selector/label strategy to ensure experiment pods are also added as endpoints.

If we want support for experiments being included in the fine-grained setWeight of a strategy.canary rollout using ingress controllers it will require some additional thought around how to specify weighting between the current revision and the upcoming revision running an experiment.

jessesuen commented 3 years ago

If you are not using traffic management with ingress controllers and rely only on pod replica ratios for traffic balancing, you can leave off the canaryService and stableService and use your own selector/label strategy to ensure experiment pods are also added as endpoints.

@mrak explained it better than I could have. Currently, the Experiment steps in rollouts allows you full control over the labels, with an additional convenience syntax to easily use the stable/canary pod hashes.

If we want support for experiments being included in the fine-grained setWeight of a strategy.canary rollout using ingress controllers it will require some additional thought around how to specify weighting between the current revision and the upcoming revision running an experiment.

Yes, great explanation. The current Experiment integration in Rollout is definitely geared more towards the weighted-replica-count canary (before we had traffic management features/integration with ingress controllers and meshes). To use the Experiment steps in conjunction with advanced traffic splitting capability, we'll need to think about how to express that.

Eslamanwar commented 3 years ago

We face the same issue , as we cant use Experiment CRD as Kayenta-style analysis to compare canaries against an additional baseline deployment brought up at the same time. as we using Istio traffic management features and not weighted-replica-count canary . any feature plan to implement it as it is standard way to implement baseline approach in Canary.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity.