Improve VPC CNI memory by reducing number of things it is caching

aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS

Apache License 2.0

2.27k stars 738 forks source link

Improve VPC CNI memory by reducing number of things it is caching #2887

Open GnatorX opened 6 months ago

GnatorX commented 6 months ago

What would you like to be added: Narrow down what VPC CNI is caching to reduce memory utilization in large clusters. Currently we see pretty high memory utilization and seems to scale with nodes.

Why is this needed: The current behavior is pretty problematic when cluster size gets large (5000+ nodes) causing us to increase memory request for the CNI even though it isn't necessary for the CNI to use all that memory.

I believe simply adding a new ByObject Filter on Node object is enough to reduce cache utilization. Since https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/k8sapi/k8sutils.go#L183 only gets the node object the CNI is running on.

orsenthil commented 6 months ago

Thanks for the report and the Pull Request. Have you done any measurements with and without this change? Could you share the differences?

GnatorX commented 6 months ago

Not yet. Will update once I have tested this

GnatorX commented 5 months ago

@orsenthil I am wondering if it make sense to even cache nodes. K8s caches which usesList + watches on startup are extremely expensive calls. The CNI only cares about the node it is running on and calls with node name is index from k8s side which is relatively fast. Rather than filtering, why not just use non-cached calls get that information?

The availability difference isn't that high, watches vs a call.

GnatorX commented 5 months ago

I took a pprof of the issue. Screenshot 2024-04-29 at 10 30 58 AM

It seems like the issue is with the stream watcher is consuming memory during cluster size increase. It seems to require quite a bit of memory in order to process all nodes and store it in the memory. Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I need to re-test this with my change however I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

orsenthil commented 5 months ago

K8s caches which usesList + watches on startup are extremely expensive calls

Even though the memory consumption isn't very high, its still unnecessary to store all node information in cache.

I do believe the real solution is to avoid performing list watch against all nodes and only watch for node events specific to the CNI.

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

With your changes, if you see any different in both memory and performance, please share an update here.

GnatorX commented 5 months ago

It is pretty standard for k8s client calls to use the cached client. It will be good to measure difference in the memory usage and the performance of the various operations in the large clusters before we decide to not use the cache.

Agreed.

When I tested my changes, it didn't yield significant difference in memory utilization. I believe, as shown in the pprof, the memory usage is because of the stream watcher attempting unmarshal incoming data. I think rather than using a informer cache and raw watch against the node itself may be more efficient(?).

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

orsenthil commented 3 months ago

I can close to issue for now since I likely don't have time to look into writing a direct watcher instead and I think the memory spike isn't large enough to be a concern.

This sounds reasonable, if we have better proof with improvements, we can bring this change in.

github-actions[bot] commented 3 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one.

GnatorX commented 5 days ago

@orsenthil can we reopen this. We have more data now

GnatorX commented 5 days ago

When we added the filter https://github.com/aws/amazon-vpc-cni-k8s/pull/2888 we were able to drop our memory utilization on a 3000 nodes cluster.