Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.93k stars 295 forks source link

kube-proxy IPVS support #1846

Open kshke opened 3 years ago

kshke commented 3 years ago

What happened:

In the current deployment because of the IPTABLES logic we’re seeing pod evictions because of load imbalance. This is making us run with a larger number of pods against the planned resource usage.

More importantly, the application will not scale due to this networking restriction and will result in a SLA/commercial impact.

What you expected to happen: Need to modify the kube-proxy routing to IPVS with least connection algorithm Kubernetes Service (AKS)

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

ghost commented 3 years ago

Hi kshke, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost commented 3 years ago

Triage required from @Azure/aks-pm

ghost commented 3 years ago

Action required from @Azure/aks-pm

palma21 commented 3 years ago

No plans at the moment for IPVS support. Can you expand a bit more on the issues you're having which you believe are cause by iptables?

sylr commented 3 years ago

I really think AKS should invest into kube-proxy in ipvs mode because iptables is not a lasting model.

Every papers out there show that the iptable mode does not scale when your number of services grows above 1K.

https://www.projectcalico.org/comparing-kube-proxy-modes-iptables-or-ipvs/

palma21 commented 3 years ago

For us it's not a question of investment, this is something that we're following closely, evaluating over time and investing on upstream. It's a question of maturity and enterprise readiness. In short iptables is a much older and very battle hardened piece.

We have dozens of thousands of clusters with more than 1k services running fine today, and that article does not seem to disagree with this, it correctly points out that after 1k ipvs will have performance gains vs iptables. Having in mind the absolute values of those gains, and using that article as a reference, at the moment we're siding a bit more with the performance tradeoff for a more proven technology.

It is something though that we keep evaluating as the needs for more intelligent load balancing also evolve, hence my interest in understanding the scenarios like the OP.

shankycheil commented 3 years ago

We have a product that is a collection of multiple pods (each hosting a different set of services). One of these pods hosts an aggregation/orchestration service and one other host a transactional service, the service times of which are dependent on the amount of data being retrieved.

The behavior we've observed is that when the downstream service is taking longer to respond due to data volumes, the orchestration pods continues to queue requests on downstream pods irrespective of their current load/response volumes. In such a scenario we believe we would benefit from the least-connection routing algorithm offered by kube-proxy in the IPVS mode.

As an interim measure, we exposed the downstream services on an ingress (NGINX) controller and modified the algorithm to least-connection. The subsequent tests showed gains in the number of message processed and overall lower pod evictions and resource utilizations. While all our pods reside within the same AKS environment we wish to use the NodePort routing (which is again dependent on internal load balancers and IPTABLES) as opposed to creating ingress endpoints for all downstream services.

Having to expose all downstream services through ingress also adds overheads in regards to security to each service endpoint/downstream pod.

As a product team it is important for us to be a able to test our product/solution under various deployment/implementation parameters so as to suggest the optimal operating environment/BOM to our customers. While kubernetes as a product has this capability from 1.8 onwards, AKS however does not provide us with the flexibility.

slawekww commented 3 years ago

@palma21 Would you be so kind to share information if IPVS or cilium on any other network provider is considered on AKS roadmap? Just to get rid of iptables option in future.

palma21 commented 3 years ago

Yes, we're exploring both options so in that sense considering them, but our larger scale tests haven't really been too fruitful/successful to allow us solid conclusions and draft any plans yet, we expect to have a clearer expectation here by the end of the year.

AurelioBelletti commented 3 years ago

What are the expectations in supporting ipvs? We are finding that load is not evenly distributed to backend PODs using iptables and would like to use ipvs instead. Iptables is ideal for firewall rules scenarios, ipvs is better for load balancing.

slawekww commented 3 years ago

Problem could be calico and big clusters/huge number of pods as for each pod IPTables rule is created. Other competitors allow to configure Network Provider like calico, canal, etc.

Each AKS desired functionality would differ and customer should have options to choose providers depends on needs. Calico/IPTables could be bottle-neck.

mohitsaxena2005 commented 3 years ago

No plans at the moment for IPVS support. Can you expand a bit more on the issues you're having which you believe are cause by iptables?

We also have a scenario where we need to open long live connections to the clients may be > 10 hours and the new requests should move to least connected pod, Is there any alternate way that could be used if IPVS is not supported or could we enable it in AKS.

Farzad-Jalali commented 2 years ago

I have a similar requirement to have a loadbalancer that redirect traffic to the pod with the least open connection , How should I get it done in Azure AKS?

IlyaKiselevKolibri commented 2 years ago

We have a similar problem of the pod load imbalance. Do you have any updates, please? @Azure/aks-pm

krishnadce commented 2 years ago

We also have a similar requirement and planning to use IPVS. Whats the roadmap please? @Azure/aks-pm

caleb15 commented 2 years ago

Note that Amazon is already working on this feature, so if you want feature parity with EKS you should start implementing this soon.

We also ran into a situation where the load-balancing wasn't working properly with IP tables. One of our pods was receiving much less traffic then the others. @shankycheil can you expand on your workaround? As I understand it nginx with least connection mode sends the request to the server in the upstream with the least connections, but if you're routing requests from nginx to a service, then you just have one upstream server - the service. If you hardcoded address of the pods as servers in the upstream then you would run into problems when the pods rollover due to deployments or scale up or down.

shankycheil commented 2 years ago

Note that Amazon is already working on this feature, so if you want feature parity with EKS you should start implementing this soon.

We also ran into a situation where the load-balancing wasn't working properly with IP tables. One of our pods was receiving much less traffic then the others. @shankycheil can you expand on your workaround? As I understand it nginx with least connection mode sends the request to the server in the upstream with the least connections, but if you're routing requests from nginx to a service, then you just have one upstream server - the service. If you hardcoded address of the pods as servers in the upstream then you would run into problems when the pods rollover due to deployments or scale up or down.

@caleb15 - the issue we faced was not from NLB to the service but in service to service communication - the inter-service communications where the called service pods were on nodes other than those on the same nodes as the calling pods saw lower traffic... exposing all services as being called through the load-balancer was one of the solutions.. the other being switching from the inbuilt azure NLB to NGINX for service routing and enabling lease connection as the routing option on NGINX.

ghost commented 2 years ago

This issue will now be closed because it hasn't had any activity for 7 days after stale. kshke feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.

BartVB commented 2 years ago

we expect to have a clearer expectation here by the end of the year.

Is there any news on this?

denniszielke commented 2 years ago

Will this also support setting the --ipvs-scheduler property?

palma21 commented 2 years ago

Yes

SY185098 commented 2 years ago

@phealy can you please provide a timeline for this to be available!

palma21 commented 1 year ago

https://learn.microsoft.com/en-us/azure/aks/configure-kube-proxy

mblaschke-daimlertruck commented 1 year ago

Do you have a timeline for getting this GA?

denniszielke commented 1 year ago

Soon.

ankur-rafay commented 1 year ago

Any update on when this will be GA?

Sritejahipaas commented 1 year ago

Any update on when this will be GA?

shksin commented 3 months ago

Any update on when this will be GA?

byronestebancndt commented 1 month ago

@chasewilson any possible date for GA?

AlftioH commented 1 month ago

This feature is already in Public Preview. For more details make reference to https://learn.microsoft.com/en-us/azure/aks/configure-kube-proxy