fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.85k stars 725 forks source link

Progressive rollouts via pod readiness gate #1334

Open dudicoco opened 1 year ago

dudicoco commented 1 year ago

Currently flagger supports advanced deployment strategies mainly via a service mesh or an ingress. These advanced methods are great but they also add extra complexity.

I suggest adding a new deployment method via pod readiness gate.

How it works:

  1. A pod readiness gate is added to the deployment spec, for example:
    readinessGates:
    - conditionType: "flagger.io/progress"
  2. Once a rollout takes place and new pods are launched, the deployment progress will stop until flagger updates the readiness gate
  3. Flagger performs an analysis
  4. If the analysis passes flagger updates the readiness gate field in the new pods
  5. The deployment progresses according to the rollingUpdate strategy and new pods are launched
  6. Repeat

Advantages:

  1. Native deployment object can be used, no need to create new deployments and shift traffic between them
  2. No need for special considerations for HPA and configmaps
  3. Can work with daemonsets and statefulsets as well

I believe this feature will make flagger much more approachable to a wider audience who is not using service meshes and will allow for a super simple onboarding while using existing production deployment/hpa resources with no need for migration.

aryan9600 commented 1 year ago

Hello, thank you for your suggestion. While this is certainly an interesting idea, I don't think I'm sure of its viability. For example, (if I understand correctly) you're suggesting that Flagger should set the readiness gate to false in the new pods of the Deployment, but that would make the new pod unready. From the docs:

For a Pod that uses custom conditions, that Pod is evaluated to be ready only when both the following statements apply:

  • All containers in the Pod are ready.
  • All conditions specified in readinessGates are True.

If the pod is unready, that means no traffic will be routed to it via it's corresponding service. I'm happy to discuss more and explore this though :)

dudicoco commented 1 year ago

Hi @aryan9600,

Thanks for the input. You're right, when a readinessGate condition is not set to true the pod is not ready and thus does not serve traffic, I did not take that part into consideration.

I have an alternative solution - pause the deployment each time after it has advanced the rollout progress: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#paused

Here is the flow:

  1. Once a rollout takes place and new pods are launched, flagger pauses the deployment.
  2. Flagger performs an analysis
  3. If the analysis passes flagger resumes the deployment
  4. The deployment progresses according to the rollingUpdate strategy and new pods are launched
  5. Repeat
dudicoco commented 1 year ago

@aryan9600 ping