Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.96k stars 306 forks source link

UDP packets are partially duplicated after upgrade to 1.19.7 #2204

Closed cubed-it closed 3 years ago

cubed-it commented 3 years ago

What happened: UDP packets generated from a Pod on VM 1 are received twice by another Pod on VM 1.

What you expected to happen: UDP packets should not be delivered twice.

How to reproduce it (as minimally and precisely as possible): Unfortunately, I can't give minimal instructions on how to reproduce it. In the specific case, it was noticed that - after upgrading from AKS 1.18.8 to 1.19.7 - duplicate entries appeared in the logging. To capture logs from our ASP.NET-Services we use Graylog with a UDP input. It is noticeable that only entries appear twice where the creator and recipient VM are the same. A change from UDP to HTTP solves the problem. So it is obvious that something is wrong with the UDP routing in 1.19.7 or after upgrading this specific cluster.

Anything else we need to know?: Are there any useful things I can check on the VMs?

Environment:

ghost commented 3 years ago

Hi cubed-it, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost commented 3 years ago

Triage required from @Azure/aks-pm

miwithro commented 3 years ago

@paulgmiller can you get someone to look into this?

ghost commented 3 years ago

Action required from @Azure/aks-pm

ghost commented 3 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 3 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 3 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 3 years ago

Issue needing attention of @Azure/aks-leads

miwithro commented 3 years ago

@AbelHu can you address this one?

AbelHu commented 3 years ago

@cubed-it could you help to file a support ticket so we can investigate it?

  1. ssh to the VM1 and run C:\k\debug\startpacketcapture.cmd
  2. Reproduce the issue
  3. ssh to the VM1 and run C:\k\debug\stoppacketcapture.cmd and then powershell C:\k\debug\collect-windows-logs.ps1
  4. Collect the trace and logs to share it with CSS
cubed-it commented 3 years ago

@AbelHu I'm sorry, but the cluster has long been updated by several versions and a workaround (we are using http instead of udp now) has been implemented for the problem. I am therefore unfortunately not able to contribute any further on this issue and from my side the issue can be closed.