Azure / AKS

Azure Kubernetes Service
1.92k stars 284 forks source link

[BUG] azure-vnet-telemetry leaks sockets #4207

Open geertn opened 1 month ago

geertn commented 1 month ago

Describe the bug The process azure-vnet-telemetry seems to be leaking sockets. Errors are also present in the logs. Every error line corresponds to a socket leak. They happen on multiple nodes in the AKS cluster but not all. The issue has been present for weeks and caused a production outage because of too many open sockets.

To Reproduce

Expected behavior No errors and no leaking sockets.

Environment (please complete the following information):

k get nodes -o wide
NAME                               STATUS   ROLES   AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
aks-systemz1-13451524-vmss000000   Ready    agent   4h1m    v1.27.9   10.106.16.4     <none>        Ubuntu 22.04.4 LTS   5.15.0-1059-azure   containerd://1.7.14-1

Additional context

logfile.txt

Open support issue 2403170050000665

tamilmani1989 commented 3 weeks ago

Thanks for letting know.. We are tracking the issue here https://github.com/Azure/azure-container-networking/issues/2703 and will be working on a fix. I will keep this thread updated

geertn commented 2 weeks ago

I see the fix is merged, thank you for that. Will this be included in the upcoming node image release?