aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Apache License 2.0
2.27k stars 737 forks source link

AWS-VPC CNI is wasting a lot of IPs, what to do ? #2017

Closed msa21 closed 2 years ago

msa21 commented 2 years ago

As we all know AWS-VPC-CNI waste a lot of ip addresses during allocation and deallocation of pods and that is why our Production cluster is full of ip limit exhaustion alerts . Now as of now we can not add more subnet as the existing subnets are not being used efficiently so is their any way to tackle this issue.

Note : Custom networking, assigning prefixes is currently going through a lot of bugs as i can see in open issues section so can`t use it as a solution. If their is an automated script which does shuffling of nodes/pods to free the dangling ips please let me know .

I am guessing many of us must be facing the same issue.

Environment:

achevuru commented 2 years ago

@msa21 Can you elaborate on what you mean by wasting IP addresses during allocation and deallocation? VPC CNI will only use IP addresses based on pod density on the instance/node and how you configured your WARM/MIN IP and ENI targets. Please refer to README on how these env variables influence how many IPs or Prefixes VPC CNI keeps in it's cache and you can adjust them accordingly based on your need. If you're referring to VPC CNI leaking IP addresses, then there are no known issues w.r.t that in VPC CNI - If you're observing that behavior please share more info and logs.

Also, there are no known functionality issues with either Custom Networking or Prefix Delegation mode - Can you point to what known issues you're referring to?

msa21 commented 2 years ago

Hi @achevuru @jayanthvn

I can explain the ip wastage by amazon aws-vpc-cni but then i found this to the point explanation which is explaining the exact issue in the best possible way, please visit this once :

https://medium.com/compass-true-north/experiences-for-ip-addresses-shortage-on-eks-clusters-a740f56ac2f5

I would not waste your time but this is the burning issue for my eks production server and we can not add more CIDRs to my VPC as the current subnets are not being consumed properly.

achevuru commented 2 years ago

@msa21 Thanks for sharing the blog post. I refer back to my previous response where I indicated that number of IPs VPC CNI will hold in it's cache (warm pool) is entirely controlled by the user (i.e.,) you should be able to pick appropriate values for WARM/MIN IP and ENI targets that suits your use case. VPC CNI manifests come with certain defaults for these env variables and they might not be appropriate for all the use cases and you should adjust them to suit your needs. These are exposed as configurable variables for the exact same reason...and you can see they settled on WARM_IP configuration in the above blog post as well which reduced the no. of IPs held in warm pool.

Please refer to below docs for more details on how VPC CNI uses these env variables and how you can configure them to limit the no.of IPs held in cache. (You can configure it to hold only 1 additional IP per node for ex )

https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/eni-and-ip-target.md https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/prefix-and-ip-target.md

You will need to strike a balance between how many EC2 API calls you want CNI pods to make and how many IPs you want to hold in cache. Obviously the CNI will make less no.of EC2 calls if it has IPs available in the cache.

If you're experiencing IP exhaustion in general with your IPv4 address space - you can explore IPv6 which EKS (and VPC CNI) now supports..

Garima-Negi commented 2 years ago

Hi, please redirect me to a better issue/PR for the question:

Does amazon-vpc-cni have support for vpc trunking in c6i and m6i instance types?

Last I checked the vpc-resource-controller was to get a few changes regarding this : https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/91#issuecomment-1030366732

wondering of the changes have been made for c6i/m6i and are a part of amazon-vpc-cni ?

jayanthvn commented 2 years ago

@Garima-Negi - Yes amazon-vpc-cni has support for c6i and m6i - https://github.com/aws/amazon-vpc-cni-k8s/blob/v1.11.2/pkg/awsutils/vpc_ip_resource_limit.go#L137

Which CNI version are you using?

VPC RC 1.1.3 does support c6i and m6i - https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/v1.1.3/pkg/aws/vpc/limits.go

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

github-actions[bot] commented 2 years ago

Issue closed due to inactivity.