Open nrutigs opened 4 years ago
Flagger knows about Kubernetes services, so it can set that part in the annotation, but what's TargetGroupArn
?
To my knowledge its something outside of the cluster that you'd like the ALB to split traffic to as well
Hey @stefanprodan is there any update on this? Is there any other information I can provide to get an answer? It's definitely something I'd look to contribute if its possible :)
@nrutigs one key feature in Flagger are the builtin metrics such as success rate and latency. These are implemented with Prometheus queries, does the ALB ingress exposes a metrics endpoint for Prometheus?
So the ingress does expose a metrics endpoint - https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/v1.1.4/cmd/main.go#L153 https://github.com/kubernetes-sigs/aws-alb-ingress-controller/blob/v1.1.4/cmd/main.go#L104-L107
However I think these are for internal monitoring of the ingress controller and not the ingresses themselves. For that you likely need to access Cloudwatch instead but it seems like Flagger already has that as a provider that it could use?
The builtin metrics are for Prometheus. I guess we can say in the docs that for ALB people should use Cloudwatch. We should provider two metric templates for error rate and latency https://docs.flagger.app/usage/metrics#amazon-cloudwatch
Wicked! Is there any other blockers behind using ALB you can see? Otherwise it's definitely something I'll try to start working on contributing.
My main concern is around maintenance because you can't have an e2e test suite for ALB+CloudWatch on Kubernetes Kind, like we have for any other ingress controller. @nrutigs if you are willing to work on this I'll be happy to review a PR.
Hmm that definitely is an issue. Maybe there's a solution using eksctl and some bash scripting but obviously it might be harder to fit that into your CI.
Thanks for the responses for now! I'll probably annoy you in #flagger slack in the near future if I can get through the right approvals to do this at work.
@nrutigs running eksctl in CI could do it but but it's a lot of work, clusters must be created on the fly and removed after a test run. The e2e test framework must ensure ALB+CloudWatch are ready, this could mean waiting 30m or more for the cluster to be created and for ALB instance to become ready.
My impression is that ALB metrics are not sent in real-time in CloudWatch, Flagger needs the metrics data to be "fresh", for example Prometheus has a 5sec delay. If CloudWatch is several minutes behind then the analysis will fail since Flagger will not be able to determine if the canary is conformant or not. One workaround would be to increase the analysis interval to 10 minutes or more but I'm not sure if this could work, you'll need to try it out.
Another important aspect of testing on AWS is around who will support the cost of spinning up EKS clusters on each commit.
Could a possible workaround be ALB -> Nginx ingress (with prometheus exports) -> Flagger monitored Canary deployment? Flagger would then monitor the prometheus metrixs from Nginx Ingress rather than the ALB. Then it's just up to the user to define ALB -> Nginx Ingress routing
@akuzni2 Flagger already supports NGINX ingress, if an ALB sits in front of it, then it’s irrelevant to the canary analysis and routing. Docs here: https://docs.flagger.app/tutorials/nginx-progressive-delivery
A use-case that we would like this for is to actually have Canary or Blue/Green deployments of the ingress-nginx
itself, so we can auto-update and also to do the roll out in a more controlled manner with the metrics.
Here's ALB ingress controller's docs for setting up weighted traffic splitting, which could be used by flagger to support it officially:
https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.6/guide/use_cases/blue_green/
Any updates on this? There must be a bunch of people who would love this support (myself included!)
FYI, Argo CD's Argo Rollouts (similar to flagger), has support for AWS ALB, if anyone is comparing between the two: https://argoproj.github.io/rollouts/
Another important aspect of testing on AWS is around who will support the cost of spinning up EKS clusters on each commit.
LocalStack? What about AWS test credit?
Maybe I'm stating the obvious here, but there is a lot of hard and soft investment in Flux/Flagger that seems could be used for developing this support. With my GitLab use personally and professionally, Flux/Flagger has a default prejudice thus I'm interested in this issue.
Is there any plans to develop or is it possible to contribute ALB Ingress support for flagger?
There appears to be support in ALB annotations for custom weights
As well as support for matching rules
Thanks!