Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

Windows Pods response time is more then 7 minutes #4044

Open destroyogi opened 8 months ago

destroyogi commented 8 months ago

Describe the bug In a cluster with 2 Windows nodes and 10 pods, everything operates smoothly for about 30 - 40 hours. After this period, several Windows pods begin experiencing connectivity issues with external and non-cluster resources, with response times reaching 7-8 minutes. Simple actions like a curl request to Google take over 7 minutes to complete.

To Reproduce

  1. Establish a cluster featuring Windows nodes.
  2. Deploy pods utilizing both internal and external internet resources.
  3. After roughly 30 - 40 hours, observe the pods' failure to connect.
  4. Notice that attempting to run curl from the Windows pods takes around 7 minutes, and sites with typically low response times time out.
  5. Remoting into the Windows nodes and executing curl there operates normally.
  6. Rebooting the Windows pods temporarily resolves the issue for about 36 hours.

Expected behavior Resources within the network and external internet resources should be consistently accessible without significant response delays.

Environment (please complete the following information):

Additional context The exact cause is unknown, and troubleshooting is challenging. Logs can be examined in c:\k, but it's unclear what to look for. This issue appears similar to https://github.com/Azure/AKS/issues/1014.

AbelHu commented 6 months ago

@destroyogi Thanks for the feedback. Could you file a support ticket so AKS CSS can help to collect repro steps with detailed logs/network capture for advanced investigation?

ShiqianTao commented 6 months ago

Hi @destroyogi . If you were to use curl.exe instead of curl, would the issue still persist?