aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Apache License 2.0
2.28k stars 743 forks source link

Add support to reconcile allocated Pod IPs. #3113

Open hbhasker opened 1 week ago

hbhasker commented 1 week ago

What type of PR is this?

improvement

Which issue does this PR fix?:

Fixes #3109

What does this PR do / Why do we need it?: The CNI today only reconciles its datastore with existing pods at startup but never again. Sometimes its possible that IPAMD goes out of sync with the kubelet's view of the pods running on the node if it fails or is temporarily unreachable by the CNI plugin handling the DelNetwork call from the contrainer runtime.

In such cases the CNI continues to consider the pods IP allocated and will not free it as it will never see a DelNetwork again. This results in CNI failing to assign IP's to new pods.

This change adds a reconcile loop which periodically (once a minute) reconciles its allocated IPs with existence of pod's veth devices. If the veth device is not found then it free's up the corresponding allocation making the IP available for reuse.

Fixes #3109

Testing done on this change:

Added unit tests.

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades? Has updating a running cluster been tested?: No

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

No

- Reconcile allocated Pod IPs against existing host veth devices periodically to clean up stale IP allocations [Fixes #3109].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

orsenthil commented 1 week ago

The CNI today only reconciles its datastore with existing pods at startup but never again.

This is done periodically. That's how the datastore keeps track of the ips used, available and does a ENI call if the additional ENI is needed.

hbhasker commented 1 week ago

The CNI today only reconciles its datastore with existing pods at startup but never again.

This is done periodically. That's how the datastore keeps track of the ips used, available and does a ENI call if the additional ENI is needed.

I guess I meant reconcile the allocated IPs against running pods. It doesn't check in steady state that a pod is still around when its IP is allocated. The assumption is that if the pod was deleted the IPAMD would have been called by the CNI plugin. But I think there are some cases where this assumption breaks down, like if the CNI->ipamd grpc call fails for any reason. There are probably other edge cases but I couldn't fully reason out what could cause this to happen.

hbhasker commented 1 week ago

ping for a review!