eclipse-threadx / usbx

Eclipse ThreadX - USBX is a high-performance USB host, device, and on-the-go (OTG) embedded stack, that is fully integrated with Eclipse ThreadX RTOS
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/usbx/index.md
MIT License
157 stars 91 forks source link

CDC ECM packet pool drop #120

Open ZerAtaii opened 11 months ago

ZerAtaii commented 11 months ago

What target device are you using? LPC55S69 Which version of Azure RTOS? 6.1 What toolchain and environment? arm-none-eabi-* + WSL

Hi, I am developping a embedded application on a MCU that runs Azure RTOS with the netxduo/usbx layers to manage TCP/IP sockets & packets. This MCU communicates with a laptop through an Ethernet over USB protocol (USB CDC-ECM). The laptop runs Ubuntu and can open TCP ports through netcat command and send a various range of command.

Everything is running fine most of the time but we sometimes experience random problem where all the opened TCP ports get frozen. Indeed, it is no longer possible to open sockets on these ports, or even to send commands on ports that have already been opened.

I was suspecting packet pool leak with NetX/UsbX so I've added a debug log trace on _nx_packet_allocate() and _nx_packet_release() to find out which thread is taking a packet from which packet pool and what are the number of packet left.

Thanks to this, I can see that, when the problem is happening, the "ux_slave_class_cdc_ecm_bulkout_thread" thread requests one packet per second without ever releasing it, as you can see on the screen attached. As soon as the packet pool drops to 0, my application is stuck (no TCP port available, everything seems to be frozen).

After several minutes (5/10/15?), all packets are released at once, but I can no longer communicate with the MCU. It's as if the USB link had been removed. I have to reboot my MCU to make it work again.

Such behavior is not acceptable, as it will not be possible to unplug/replug in the final product.

Do you have any idea of the cause and how to solve this bug?

Thanks in advance, Best regards, Antoine

host_ip_service packet_drop teraterm3.log

xiaocq2001 commented 11 months ago

ux_slave_class_cdc_ecm_bulkout_thread keeps polling CDC-ECM ethernet packets from USB bulkout endpoint, when there is packets received, they will be passed to NX to handle (and released in NX). From your description it seems there are real ethernet inputs from USB ethernet, but input packets are too many for NX to process and release in time. Maybe you can consider increasing the pool size to buffer more packets for processing.

ZerAtaii commented 11 months ago

Thanks for the answer. It also happens in cases of low use (very few exchanges), and not necessarily in intensive use (curl downloads for 12 hours in a row, for example). It seems purely random. We tried increasing the number of packets, but the problem remained the same, but a little later (time to empty the packet pool). We're currently already at the ram limit...

xiaocq2001 commented 11 months ago

On USB side, the packets are received and passed to upper layer (maybe application) and upper layer take the ownership to free them, so I think application may need optimization on ethernet packets handling, while we are checking if there is something could be done on USB side.

BTW, it seems CDC-ECM only recognized by linux. I'm not sure if you can share your way to make CDC-ECM recognized on windows for WSL so it's easier for us to reproduce the issue.

ZerAtaii commented 11 months ago

Thanks ! What do you have in mind when you say "application"? In NetX or in our application even higher up? You can find attached a tutorial to use our application with WSL. It will probably help you.

CM connect-3-5.pdf

xiaocq2001 commented 11 months ago

Thanks for sharing.

When I say "application", I mean your application or even higher up. The packets allocated and filled in USBX is passed to upper layers and the packets ownership is also passed to upper layers, they should process and free the packets in time.

xiaocq2001 commented 11 months ago

A possible improvement for ethernet packets handling in USBX is, in ux_device_class_cdc_ecm_bulkout_thread.c, if no free packet is available, host bulk out transfers are NAKed currently. Such a blocking of host bulk out transfer may cause host behavior to reset the device (just a guess, from your observation of deactivate and activate again, it's host specific behavior), maybe you can try to allocate NX packet after USB bulk out transfer, if no packet free the packet is dropped by discarding the data. In this way the host is not resetting the device, but network packets are dropped until free packet available.

Note that the upper logic change does not help on packets handling and releasing, upper layers still need to be checked to find the real issue (why packets are not handled and released).

ZerAtaii commented 11 months ago

We've clearly understood that packet pool creation takes place in the application part. However, when all goes well (for a CURL via WIFI, for example), the entire packet release mechanism does not leave the NetX layer. At least, that's what the stack frame suggests... MicrosoftTeams-image (7)

Some packets are indeed released by the application layer, but not in very specific cases of important network exchange.

TiejunMS commented 11 months ago

For TCP packets, some are indeed queued by TCP control block.