fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.85k stars 725 forks source link

Canary upgrade without Ingress Controller/Service Mesh support #1226

Open antaloala opened 2 years ago

antaloala commented 2 years ago

Describe the feature

In my company we have been evaluating several cloud native "progressive delivery" technologies and Flagger is the resulting winner (so many good things in flagger thanks to its really good/clear architecture that assures not to collide with GitOps reconciliations in case of GitOps based CD realization).

The only thing we are "missing" (compared to other solutions we evaluated, e.g. Argo-rollouts) is the support for Canary upgrade without the need for any Ingress controller/service mesh to steer the "production" traffic between primary and canary pods in a controlled/weighted way.

In Argo-rollouts there is the possibility to handle a canary upgrade on these workloads, playing with the number of primary and canary pods (so a coarse-grained traffic distribution based on the amount of pods for each is handled by a single ClusterIP k8s service selecting all primary and canary pods).

I wonder if this Use Case has been discussed in the community (I did not find any old github issue on this), I mean, some way to handle canary upgrades on these out-of-service-mesh and not-externally-exposed workloads.

Proposed solution

In #1157 it is clarified that GitOps compliance requires HPA in a flagger driven canary upgrade, what means flagger should not touch the min/max fields in the primary HPA object but it could in the canary one as it is hidden to GitOps agent(s). Having a ClusterIP service selecting the canary and primary pods plus flagger-controller setting the number of replicas in the canary Deployment (thanks to the canary objects being shadow objects hidden to the GitOps controllers) could be used to control how many pods we have for each (primary and canary) and so flagger-controller setting (in a coarse grained way) how much % of production traffic is handled by primary and canary pods

Could this be a feasible solution for this (possible) new feature in Flagger? Any other proposal/better solution for it?

antaloala commented 2 years ago

Any quick comment/reaction? @stefanprodan , could this be a possible feature to add in a future?