fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.79k stars 716 forks source link

Possibility of multiple Intervals #1618

Open shivamnarula opened 3 months ago

shivamnarula commented 3 months ago

Describe the feature

Support for specifying intervals according to stepWeights.

What problem are you trying to solve? Suppose we specify stepWeights with [10,20,50] and interval as 5m. Shifting traffic from 50% -> 100% in 5 mins seems like a big deal and also we don't want to delay promotion by introducing multiple steps.

Proposed solution

What do you want to happen? Add any considered drawbacks. A way to specify Intervals in a list which should be equal to number of stepWeights given. No, no drawbacks considered yet.

Any alternatives you've considered?

Is there another way to solve this problem that isn't as good a solution? No

LiZhenCheng9527 commented 3 months ago

Do you want to specify intervals for each stepweights in test phase, or do you want to specify step weights and intervals for the 50%->100% rollout phase?

shivamnarula commented 3 months ago

I wish to specify intervals for each stepweights in test phase.

stefanprodan commented 3 months ago

Shifting traffic from 50% -> 100% in 5 mins seems like a big deal and also we don't want to delay promotion by introducing multiple steps.

How are you introducing delay, if you add [10,20,50,70,100]? There is no difference from being able to say 50 -> 100 in 10 minutes.

LiZhenCheng9527 commented 3 months ago

Can you give practical scenarios for using this feature?

shivamnarula commented 3 months ago

We have a few services having high qps, and going from 5% to 10% in let's say 5mins won't cause an issue in downstream services performance, where as in the same 5mins duration going from 50% to 75% or 80% could cause issue, if bad code is pushed. Also, we don't want to add lots of stepweights which could delay promotion for a good amount of period.

shivamnarula commented 3 months ago

What do you guys think about this?

aufarg commented 4 days ago

Hi, not the OP but I have a use case for this feature being requested. We want to have longer duration of canary while being on low percentage, but shorter one when the percentage is higher. This allows us to test basic functionalities with low traffic (lower rate of error if it happens), but still allow us to do some load test afterwards with higher traffic.

For example, if we have [1,2,4,8,16,32], and we are tolerating about 2% error rate, we want the first two steps to have much higher duration (e.g. 10m), but the later percentages to lower (e.g. 1m). The reason being, having lower traffic means lower error rate, but also lower traffic rate so we might not collect enough traffic for confidence.

Right now the way we're handling this is by using [1,2,3,4,5,6,7,8...] instead with 1m interval.