Open ohadschn opened 2 years ago
I like the idea, however I am wondering whether it is possible to achieve with Application Gateway.
ClusterIPs are non-routable outside of cluster by definition. Maybe route table attached to AppGw subnet would help but right now, I'm not exactly sure how it should be configured so it routes to multiple nodes.
I guess they could always have something like an App Gateway DaemonSet that simply proxies calls to the service's ClusterIP...
I'd like to suggest a slight modification to this feature request - supporting the "LoadBalancer" service type rather than the "ClusterIP" type. This would be helpful in the particular scenario that a) the service has the azure internal load balancer annotation and b) the AKS instance is deployed using the Azure-CNI plugin. This setup cause the kubernetes services to allocate an IP address on the VNET.
With this configuration, traffic would flow from the app gateway to the AKS internal load balancer IP (on the VNET).
Another advantage of doing this is that it would alleviate the pains from websockets being disconnected when the app gateway gets reconfigured. If the app gateway is only reconfigured whenever service IPs change, it would be significantly less disruptive to existing websocket connections. That would be better than using pod IPs for the backend pools which would change way too often - especially when a deployment is using HPAs, is scaled up and down and has new IPs added and removed frequently.
I've already hacked together a prototype of this feature here: https://github.com/Azure/application-gateway-kubernetes-ingress/compare/master...wesleyae:application-gateway-kubernetes-ingress:feature/add-support-for-load-balancer-backend
Is this a feature that the AGIC could support in the future?
@mscatyao @akshaysngupta?
Can we please consider the suggestion from @wesleyae?
This would improve the stability of the ingress controller by a lot. The routing would rely on native k8s Endpoints rather than asynchronous updates back to Application Gateway.
We are currently experience timeouts when deploying, even after implementing all the suggested measures to reduce downtime (https://azure.github.io/application-gateway-kubernetes-ingress/how-tos/minimize-downtime-during-deployments/)
Is your feature request related to a problem? Please describe. The AGIC design choice of routing directly to pod IPs (rather than service ClusterIP) can cause a discrepancy between a pod's k8s state and its app gateway state, specifically following pod evictions and container recycling (due to deployment or any other reason such as resource starvation). This can result in 502 responses where AGIC routes traffic to pods deemed unready (or even non existing) by k8s. This issue is known and a few workarounds are suggested in the following document: https://github.com/Azure/application-gateway-kubernetes-ingress/blob/master/docs/how-tos/minimize-downtime-during-deployments.md
However, the workarounds suggested in the above article do not resolve the issue completely (as the article itself states). In some cases they don't mitigate the issue at all - OOM for example will kill a process with SIGKILL (not SIGTERM), meaning
preStop
will never be executed.The motivation for this design choice has been outlined here: https://github.com/Azure/application-gateway-kubernetes-ingress/issues/524#issuecomment-530058783
For some scenarios however, the drawbacks of this approach as described above, outweigh the benefits. For example, we don't use cookies and our network usage isn't so intensive that one more hop would make a noticeable difference. For such scenarios, a simple routing mode that delegates the load balancing to k8s via the ClusterIP could be very beneficial and elegantly solve the above issue. In essence, this is the workaround some customers implemented themselves, except such an implementation would require a
LoadBalancer
service, external IP per service, and manually updating the appgw routing rules: https://github.com/Azure/application-gateway-kubernetes-ingress/issues/1124.Related issues:
Describe the solution you'd like
appgw.ingress.kubernetes.io/appgw-routing-mode: cluster-ip