eclipse-threadx / usbx

Eclipse ThreadX - USBX is a high-performance USB host, device, and on-the-go (OTG) embedded stack, that is fully integrated with Eclipse ThreadX RTOS
https://github.com/eclipse-threadx/rtos-docs/blob/main/rtos-docs/usbx/index.md
MIT License
157 stars 92 forks source link

STM32 Controller does not abort receive request after timeout #165

Open JonTBeckett opened 5 months ago

JonTBeckett commented 5 months ago

Describe the bug I see that #160 is bringing the STM32 controllers into this repo. It appears the issue is in that portion primarily. If this portion isn't maintained by USBX, feel free to direct me to the appropriate maintainers.

Given

  1. A timeout value has been set for read operations via ux_device_class_cdc_acm_ioctl()
  2. At least one call to ux_device_class_cdc_acm_read() has timed out

When the host sends data to the device while the device is NOT pending in a call to ux_device_class_cdc_acm_read(), Then the data sent by the host is never delivered to the device application even on subsequent calls to ux_device_class_cdc_acm_read()

It appears that the STM32 controller is not aborting the transfer request in the USB peripheral HW, causing an unexpected interrupt. Eventually HAL_PCD_DataOutStageCallback() in the ux_dcd_stm32_callback.c file is called in the interrupt handler. This function sets the size of the incoming data in the appropriate UX_SLAVE_TRANSFER structure and then the appropriate semaphore is “put”. The issue comes from the fact that on the subsequent call to ux_device_class_cdc_acm_read(), the size in the UX_SLAVE_TRANSFER structure is set to 0, clearing any record of data already in the buffer.

Using TraceX, I confirmed that the interrupt happens before the call to ux_device_class_cdc_acm_read() by observing that there were no threads pending on the semaphore when the semaphore is “put”.

Locally, I was able to mitigate this issue in two ways:

  1. Setting the transfer request size to 0 after completing the read instead of before (still setting to 0 before transmit requests).
    1. This obviously doesn’t directly solve the issue as there is no guarantee that the buffer provided to the USB peripheral HW when the receive request was setup is still valid when the interrupt occurs.
  2. Checking the return value of the semaphore “get” operation in _ux_dcd_stm32_transfer_request() (found in ux_dcd_stm32_transfer_request.c) and calling _ux_dcd_stm32_transfer_abort() on appropriate errors (namely when TX_NO_INSTANCE is returned.
    1. My quick patch implementation of this still has a race condition between the interrupt and the aborting of the transfer request though so it is just a proof of concept at this point.

To Reproduce I don't have a demo project to share but I believe that any project that uses the CDC ACM Device class on STM32 will demonstrate the issue. The following steps provide a good starting point for forcing this issue to occur.

  1. DEVICE: Set the timeout value for reads to a low value (100ms is what I used) via ux_device_class_cdc_acm_ioctl().
  2. DEVICE: Perform at least one read via ux_device_class_cdc_acm_read() that times out BEFORE sending any data from the host to the device.
  3. DEVICE: Wait a relatively long time before performing another read (I used 15 seconds for this time period to force the issue)
  4. HOST: Send any data to the device (ensuring it happens after the first timeout and during the length of time described in step 3)
  5. Note that the device application never receives the data even on subsequent calls to ux_device_class_cdc_acm_read()

Expected behavior The USB peripheral HW transfer request should be aborted so that the interrupt does not occur outside of reads.

Impact This lowers my team's confidence in USBX. It is not a show stopper as the odds of it happening during normal operations are fairly low as long as the processing time between calls to ux_device_class_cdc_acm_read() is small. We will be doing further stress testing to see how often we hit it in general operating conditions and may have to move away from USBX if it occurs frequently enough.

JonTBeckett commented 1 month ago

I see that the related PR has been closed without merging because the files will be delivered in a dedicated repo. Is there any info on which repo and where that will be?

cosmikwolf commented 1 month ago

I am also hitting this problem, implementing a simple HID device. HID signals will always stop sending at some point, with tx_semaphore_get() returning TX_NO_INSTANCE

my setup is:

@JonTBeckett I would love to see your patch, I have not yet figured out how to implement a fix in my scenario. I could not find a related PR to this issue when I searched.

cosmikwolf commented 1 month ago

Also @JonTBeckett I think that this could may be maintained by ST, and they have this repo which seems to have the latest version of the _ux_dcd_stm32 drivers: https://github.com/STMicroelectronics/stm32_mw_usbx.git

JonTBeckett commented 1 month ago

@cosmikwolf Thanks for finding the ST repo! I'll open this same bug under there when I get a chance.

As for the patch, I'll have to dig it up. It's been a while since I tinked with and the patch is buried in a stash in one of my local repos. Hopefully I can find it again and attach it for you!