fluxcd / flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
https://docs.flagger.app
Apache License 2.0
4.89k stars 731 forks source link

AWS ALB Ingress with Flagger #659

Open nrutigs opened 4 years ago

nrutigs commented 4 years ago

Is there any plans to develop or is it possible to contribute ALB Ingress support for flagger?

There appears to be support in ALB annotations for custom weights

    alb.ingress.kubernetes.io/actions.forward-multiple-tg: >
      {"Type":"forward","ForwardConfig":{"TargetGroups":[{"ServiceName":"service-1","ServicePort":"80","Weight":20},{"ServiceName":"service-2","ServicePort":"80","Weight":20},{"TargetGroupArn":"arn-of-your-non-k8s-target-group","Weight":60}],"TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":200}}}

As well as support for matching rules

Thanks!

stefanprodan commented 4 years ago

Flagger knows about Kubernetes services, so it can set that part in the annotation, but what's TargetGroupArn?

nrutigs commented 4 years ago

To my knowledge its something outside of the cluster that you'd like the ALB to split traffic to as well

nrutigs commented 4 years ago

Hey @stefanprodan is there any update on this? Is there any other information I can provide to get an answer? It's definitely something I'd look to contribute if its possible :)

stefanprodan commented 4 years ago

@nrutigs one key feature in Flagger are the builtin metrics such as success rate and latency. These are implemented with Prometheus queries, does the ALB ingress exposes a metrics endpoint for Prometheus?

nrutigs commented 4 years ago

So the ingress does expose a metrics endpoint - https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/v1.1.4/cmd/main.go#L153 https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/v1.1.4/cmd/main.go#L104-L107

However I think these are for internal monitoring of the ingress controller and not the ingresses themselves. For that you likely need to access Cloudwatch instead but it seems like Flagger already has that as a provider that it could use?

stefanprodan commented 4 years ago

The builtin metrics are for Prometheus. I guess we can say in the docs that for ALB people should use Cloudwatch. We should provider two metric templates for error rate and latency https://docs.flagger.app/usage/metrics#amazon-cloudwatch

nrutigs commented 4 years ago

Wicked! Is there any other blockers behind using ALB you can see? Otherwise it's definitely something I'll try to start working on contributing.

stefanprodan commented 4 years ago

My main concern is around maintenance because you can't have an e2e test suite for ALB+CloudWatch on Kubernetes Kind, like we have for any other ingress controller. @nrutigs if you are willing to work on this I'll be happy to review a PR.

nrutigs commented 4 years ago

Hmm that definitely is an issue. Maybe there's a solution using eksctl and some bash scripting but obviously it might be harder to fit that into your CI.

Thanks for the responses for now! I'll probably annoy you in #flagger slack in the near future if I can get through the right approvals to do this at work.

stefanprodan commented 4 years ago

@nrutigs running eksctl in CI could do it but but it's a lot of work, clusters must be created on the fly and removed after a test run. The e2e test framework must ensure ALB+CloudWatch are ready, this could mean waiting 30m or more for the cluster to be created and for ALB instance to become ready.

My impression is that ALB metrics are not sent in real-time in CloudWatch, Flagger needs the metrics data to be "fresh", for example Prometheus has a 5sec delay. If CloudWatch is several minutes behind then the analysis will fail since Flagger will not be able to determine if the canary is conformant or not. One workaround would be to increase the analysis interval to 10 minutes or more but I'm not sure if this could work, you'll need to try it out.

Another important aspect of testing on AWS is around who will support the cost of spinning up EKS clusters on each commit.

akuzni2 commented 2 years ago

Could a possible workaround be ALB -> Nginx ingress (with prometheus exports) -> Flagger monitored Canary deployment? Flagger would then monitor the prometheus metrixs from Nginx Ingress rather than the ALB. Then it's just up to the user to define ALB -> Nginx Ingress routing

stefanprodan commented 2 years ago

@akuzni2 Flagger already supports NGINX ingress, if an ALB sits in front of it, then it’s irrelevant to the canary analysis and routing. Docs here: https://docs.flagger.app/tutorials/nginx-progressive-delivery

rafaelgaspar commented 1 year ago

A use-case that we would like this for is to actually have Canary or Blue/Green deployments of the ingress-nginx itself, so we can auto-update and also to do the roll out in a more controlled manner with the metrics.

mhr3 commented 1 year ago

Here's ALB ingress controller's docs for setting up weighted traffic splitting, which could be used by flagger to support it officially:

https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.6/guide/use_cases/blue_green/

cemenson commented 1 year ago

Any updates on this? There must be a bunch of people who would love this support (myself included!)

Chili-Man commented 10 months ago

FYI, Argo CD's Argo Rollouts (similar to flagger), has support for AWS ALB, if anyone is comparing between the two: https://argoproj.github.io/rollouts/

timothystone commented 2 months ago

Another important aspect of testing on AWS is around who will support the cost of spinning up EKS clusters on each commit.

LocalStack? What about AWS test credit?

Maybe I'm stating the obvious here, but there is a lot of hard and soft investment in Flux/Flagger that seems could be used for developing this support. With my GitLab use personally and professionally, Flux/Flagger has a default prejudice thus I'm interested in this issue.