eclipse-threadx / netxduo

Eclipse ThreadX - NetXDuo is an advanced, industrial-grade TCP/IP network stack designed specifically for deeply embedded real-time and IoT applications
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/netx-duo/index.md
MIT License
242 stars 137 forks source link

CDC ECM packet pool drop #196

Closed ZerAtaii closed 10 months ago

ZerAtaii commented 11 months ago

What target device are you using? LPC55S69 Which version of Azure RTOS? 6.1 What toolchain and environment? arm-none-eabi-* + WSL

Hi, I am developping a embedded application on a MCU that runs Azure RTOS with the netxduo/usbx layers to manage TCP/IP sockets & packets. This MCU communicates with a laptop through an Ethernet over USB protocol (USB CDC-ECM). The laptop runs Ubuntu and can open TCP ports through netcat command and send a various range of command.

Everything is running fine most of the time but we sometimes experience random problem where all the opened TCP ports get frozen. Indeed, it is no longer possible to open sockets on these ports, or even to send commands on ports that have already been opened.

I was suspecting packet pool leak with NetX/UsbX so I've added a debug log trace on _nx_packet_allocate() and _nx_packet_release() to find out which thread is taking a packet from which packet pool and what are the number of packet left.

Thanks to this, I can see that, when the problem is happening, the "ux_slave_class_cdc_ecm_bulkout_thread" thread requests one packet per second without ever releasing it, as you can see on the screen attached. As soon as the packet pool drops to 0, my application is stuck (no TCP port available, everything seems to be frozen).

After several minutes (5/10/15?), all packets are released at once, but I can no longer communicate with the MCU. It's as if the USB link had been removed. I have to reboot my MCU to make it work again.

Such behavior is not acceptable, as it will not be possible to unplug/replug in the final product.

Do you have any idea of the cause and how to solve this bug?

Thanks in advance, Best regards, Antoine

host_ip_service packet_drop teraterm3.log

ZerAtaii commented 11 months ago

It also happens in cases of low use (very few exchanges), and not necessarily in intensive use (curl downloads for 12 hours in a row, for example). It seems purely random. We tried increasing the number of packets, but the problem remained the same, but a little later (time to empty the packet pool). We're currently already at the ram limit...

ZerAtaii commented 10 months ago

We've clearly understood that packet pool creation takes place in the application part. However, when all goes well (for a CURL via WIFI, for example), the entire packet release mechanism does not leave the NetX layer. At least, that's what the stack frame suggests... MicrosoftTeams-image (7)

Some packets are indeed released by the application layer, but not in very specific cases of important network exchange.

TiejunMS commented 10 months ago

Duplicate with https://github.com/azure-rtos/usbx/issues/120