aws / amazon-vpc-resource-controller-k8s

Controller for managing Trunk & Branch Network Interfaces on EKS Cluster using Security Group For Pod feature and IPv4 Addresses for Windows Node.
Apache License 2.0
79 stars 53 forks source link

Improving VPC RC's behavior for large accounts #411

Open GnatorX opened 5 months ago

GnatorX commented 5 months ago

What would you like to be enhanced: Improve VPC RC's behavior when handling large accounts. Specifically on DescribeNetworkInterfaces calls.

Why is the change needed and what use case will it solve:

Currently, VPC RC makes 2 DescribeNetworkInterfaces calls where accounts with large number of network interfaces in a VPC and/or subnet can run into time outs.

Clean ENI

Currently Clean ENI runs every 30 minutes but attempts to get all ENIs within a VPC and work through all the returned ENIs. Since only vpc-id filter is "index"(?) on AWS' side, this could still return a huge amount of data and can run slow. I believe in our account it takes ~1.6 minute to return.

I suggest we should support running clean eni continuously where VPC RC would call DescribeNetworkInterfaces in pages and iterate through pages every N minutes or seconds rather than doing it in a single batch. This would provide a more consistent behavior without spamming AWS' APIs. One issue here is leaked ENIs could live for longer than previous (30 minutes) for larger accounts.

GetBranchNetworkInterface

This call is only made when rebuilding cache where nodes with trunk is already running in the cluster and VPC RC was restarted. This narrows in the call into just subnet ID however with sufficiently big account this is still a slow call. (we have around 40k ENIs in 1 AZ). Given that this path is only called during VPC RC start up and the tag filter isn't indexed, I suggest you shouldn't call per node in the cluster during VPC RC restart. Rather, you would call this on VPC RC start up and get all network interfaces that is a branch ENI (presences of the trunk eni tag only without checking the tag value) in a paginated manner every N second such that the cache would just build async in one go.

This should reduce the number of calls being made since we are not calling per Trunk ENI (per node) and calls are spread out per N seconds (dependent on what you found empirically make senses)

Big account flag

Lastly, some of these changes may not make sense for all accounts. I suggest we introduce a flag that indicate if an account is a large account and should behave differently. I am not 100% convinced about this yet without having more data on how these calls will perform when paginated and if we should differentiate between large and normal sized accounts

Similar things down in VPC CNI https://github.com/aws/amazon-vpc-cni-k8s/blob/cd7eb5902f5c7a0ebc008bb478843dd14440b8bd/pkg/awsutils/awsutils.go#L1811

sushrk commented 5 months ago

One issue here is leaked ENIs could live for longer than previous (30 minutes) for larger accounts

Leaked ENIs are around for ~1h today as the first time we encounter leaked it is added to cache and not immediately deleted, here. It makes sense to spread this operation over few seconds or minutes as leaked ENIs can be deleted in async manner.

We are looking into adding pagination and improve the DescribeNetworkInterfaces EC2 API call volume.

GnatorX commented 4 months ago

https://github.com/aws/amazon-vpc-resource-controller-k8s/issues/188 seems related