Azure / azure-container-networking

Azure Container Networking Solutions for Linux and Windows Containers
MIT License
375 stars 236 forks source link

[NPM] [Linux] if the kernel leaks references of 16+ ipsets, NPM remains in CrashLoop #2997

Open huntergregory opened 1 week ago

huntergregory commented 1 week ago

NOTE: v1.5.37 will fix

Symptoms

NPM Pod in CrashLoop.

Previous logs of the crashed NPM Pod look like:

E0904 18:22:41.928737       1 ipsetmanager_linux.go:131] failed to destroy all ipsets (prometheus metrics may be off now). destroyFailureCount 824637563208. err: Operation [RunCommandWithFile] failed with error code [999], full cmd [], full error after 5 tries, failed to run command [ipset restore] with error: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component
]
E0904 18:22:41.928766       1 dataplane.go:108] Failed to reset dataplane: Operation [BootupDataplane] failed with error code [999], full cmd [], full error failed to reset ipsets dataplane: error while resetting ipsetmanager: failed to run ipset restore while destroying all for resetting IPSets: Operation [RunCommandWithFile] failed with error code [999], full cmd [], full error after 5 tries, failed to run command [ipset restore] with error: error running command [ipset restore] with err [exit status 1] and stdErr [ipset v7.5: Error in line 1: Set cannot be destroyed: it is in use by a kernel component

Mitigation

Cause

There is a known issue in ipset where it leaks references. Leaked references occur when there are no iptables rules referencing the ipset and the ipset is not a member of any list ipset, yet the ipset thinks it has references.

Azure NPM resets its ipset state (deleting all IPSets) when the NPM Pod boots up. During this time, NPM detects which IPSets have leaked references, and it ignores such ipsets and deletes all other NPM IPSets. However, due to a bug in NPM code, NPM will only ignore the first 11 NPM IPSets with leaked references. Due to retry logic, if there are 16 or more leaked IPSets, then NPM will enter a CrashLoop.

Code

https://github.com/Azure/azure-container-networking/blob/ff46b571440c1a25b2d0dcec5e2a4dcf7dcb8374/npm/pkg/dataplane/ipsets/ipsetmanager_linux.go#L231

https://github.com/Azure/azure-container-networking/blob/ff46b571440c1a25b2d0dcec5e2a4dcf7dcb8374/npm/pkg/dataplane/ipsets/ipsetmanager_linux.go#L842-L855

https://github.com/Azure/azure-container-networking/blob/ff46b571440c1a25b2d0dcec5e2a4dcf7dcb8374/npm/pkg/dataplane/ipsets/ipsetmanager_linux.go#L24

huntergregory commented 1 week ago

must be backported, then included in a future release. TBD