Сanary support for kubernetes without service mesh

fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

https://docs.flagger.app

Apache License 2.0

4.92k stars 737 forks source link

Сanary support for kubernetes without service mesh #508

Open ermakov-oleg opened 4 years ago

ermakov-oleg commented 4 years ago

Do you have any plans to make canary support for kubernetes without service mesh? By adding a certain percentage of canary release to the service?

It is clear that you will not be able to distribute queries fairly by percentage, but rough rounding will also be good.

stefanprodan commented 4 years ago

Kubernetes CNI implementations are not Layer7, you can't route a percentage of the traffic as the CNI does not understands HTTP/gPRC.

mathetake commented 4 years ago

@stefanprodan I have a different point of view of this. just like argo-rollouts, which realizes traffic shifting by setting the percentage of the number of pods belonging to the specific service, though I am not sure we'd better support this type of traffic shifting with Flagger.

Although this cannot be controlled finely as istio or other L7 solutions can do, it's worth a shot with Flagger

ermakov-oleg commented 4 years ago

Yes, by sacrificing precise control of traffic distribution, we can significantly simplify the correct deployment of clusters without service mesh. This is useful for small k8s installations.

stefanprodan commented 4 years ago

Using the pod selectors like argo-rollouts does has many drawbacks as persistent connections, web sockets and gPRC connections can't be routed to new pods. Also for front-end apps this kind of routing will break the js clients since there is no way to enforce session affinity via cookies or headers.

stefanprodan commented 4 years ago

@ermakov-oleg for small k8s installations can you use an ingress controller? Flagger works with NGINX, Contour and Gloo, no service mesh needed.

ermakov-oleg commented 4 years ago

@stefanprodan Indeed, there may be a number of inconveniences for apps with a persistent connection, but when using them with Kubernetes CNI, people already have the same problems and most likely already use some load balansers.

Unfortunately, not all traffic within the cluster goes through ingress and we don't want to add an additional hop on the path of each request.

stefanprodan commented 4 years ago

@mathetake I think Flagger could manipulate the replicas on canary/primary deployments and set the apex service selector to some pod label that targets both replica sets, this would invalidate HPA but for simple deployments that don't need HPA it should work.

mathetake commented 4 years ago

this would invalidate HPA

that's the point I'm worrying about, but, as you said, it's gonna work with deployment without HPA.

However, this requires a lot of refactoring since currently the controller interaface is implemented per targetType (like deployment, service and daemonset) where pod's labels are set up. In order to realize this, I have to use provider information inside of controller interface.

That being said, I am not sure this is the way to go, though I want to support this feature

stefanprodan commented 4 years ago

However, this requires a lot of refactoring since currently the controller interaface is implemented per targetType

Not only that but having a selector that matches both canary and primary pods means mutating the canary deployment and that would conflict with GitOps operators such as Flux.