espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.98k stars 7.12k forks source link

crash in tud_hid_keyboard_report() due to race condition (IDFGH-8198) #9691

Open simpkins opened 1 year ago

simpkins commented 1 year ago

Answers checklist.

General issue report

The example code in examples/peripherals/usb/device/tusb_hid/main/tusb_hid_example_main.c appears to have a race condition that can cause it to crash if the USB bus is reset by the host, or if the device is unplugged.

The code checks tud_mounted(), and as long as the device is mounted it calls tud_hid_keyboard_report() to send a report. However, this is all being done from the main task, and this looks like it can race with USB state changes made on the TinyUSB task. e.g., if DCD_EVENT_UNPLUGGED is processed on the TinyUSB task between when the main task checks tud_mounted() and the tud_hid_keyboard_report() call, the main task will attempt to send a HID report even though the device is no longer mounted.

The dcd_esp32sx.c code crashes with a divide by 0 error if tud_hid_keyboard_report() is called while the device is not mounted.

In general, what is the recommended way for properly handling concurrency in order to send HID reports? It looks like it potentially isn't safe to send HID reports from anywhere other than the TinyUSB task itself. The FreeRTOS implementation of osal_queue_receive() appears to make it impossible to do other work on the TinyUSB task though, since it blocks waiting for an event rather than returning control when there are no USB events to process.

Example crash in tud_hid_keyboard_report() when the device is not mounted:

Guru Meditation Error: Core  0 panic'ed (IntegerDivideByZero). Exception was unhandled.

Core  0 register dump:
PC      : 0x4200ead7  PS      : 0x00060230  A0      : 0x8200cbc4  A1      : 0x3fcf39d0
0x4200ead7: dcd_edpt_xfer at /COMPONENT_TINYUSB_DIR/tinyusb/src/portable/espressif/esp32sx/dcd_esp32sx.c:356

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x00000009
A6      : 0x00000000  A7      : 0x0000cdcd  A8      : 0x00000000  A9      : 0x00000000
A10     : 0x3fc9766c  A11     : 0x00000000  A12     : 0x00000000  A13     : 0x3fc97c8c
A14     : 0x00000001  A15     : 0x0000cdcd  SAR     : 0x00000018  EXCCAUSE: 0x00000006
EXCVADDR: 0x00000000  LBEG    : 0x40056fc5  LEND    : 0x40056fe7  LCOUNT  : 0xffffffff

Backtrace: 0x4200ead4:0x3fcf39d0 0x4200cbc1:0x3fcf3a00 0x4200bbcf:0x3fcf3a20 0x4200bc11:0x3fcf3a40 0x42009b1d:0x3fcf3a70 0x42009c1d:0x3fcf3aa0 0x42009c2f:0x3fcf3ac0 0x4200a19d:0x3fcf3ae0 0x4200a339:0x3fcf3b20 0x4200a4a9:0x3fcf3b60 0x42009b6e:0x3fcf3b80 0x420097f4:0x3fcf3bb0 0x420098a1:0x3fcf3be0 0x420099aa:0x3fcf3c10 0x42009502:0x3fcf3c30 0x42027cab:0x3fcf3eb0 0x4037e819:0x3fcf3ee0
0x4200ead4: dcd_edpt_xfer at /COMPONENT_TINYUSB_DIR/tinyusb/src/portable/espressif/esp32sx/dcd_esp32sx.c:356

0x4200cbc1: usbd_edpt_xfer at /COMPONENT_TINYUSB_DIR/tinyusb/src/device/usbd.c:1299 (discriminator 2)

0x4200bbcf: tud_hid_n_report at /COMPONENT_TINYUSB_DIR/tinyusb/src/class/hid/hid_device.c:106

0x4200bc11: tud_hid_n_keyboard_report at /COMPONENT_TINYUSB_DIR/tinyusb/src/class/hid/hid_device.c:134

0x42009b1d: app_send_hid_demo() at /COMPONENT_TINYUSB_DIR/tinyusb/src/class/hid/hid_device.h:147

at commit v5.1-dev-437-g5c1044d84d

tore-espressif commented 1 year ago

@simpkins Thanks for letting us know it didn't work and sorry for the inconvenience caused by it not working correctly.

Unfortunately the tud_mounted() check in app_main function is there only to depict when we want the reporting to start, it is not meant to guard other API calls.

So the bug is in the DCD layer, as you pointed out. DCD should check if the device is mounted, before trying to send the report and mutex/spinlock should be added to guarantee thread safety

We are in process of rewriting the DCD layer at the moment

simpkins commented 1 year ago

I'm not convinced that this bug is purely at the DCD layer. This potentially seems like an upstream issue with the TinyUSB HID implementation on FreeRTOS.

There is other shared state in components/tinyusb/tinyusb/src/class/hid/hid_device.c that is accessed without any locking. If tud_hid_n_report() is called from the main task it can try to access the _hidd_itf interface state while the TinyUSB task is in the middle of clearing the state in a hidd_reset() call. If a TUSB_REQ_SET_CONFIGURATION event is being processed from the TinyUSB task while the main task is trying to send a HID report, it looks like the usbd device state can be reset after the HID code claims the endpoint and before it attempts to send the report.

In the upstream TinyUSB hid device examples, most of them only send HID reports from the TinyUSB task. The hid_composite_freertos example is the only one that doesn't, and it seems like it has this same race condition.

This problem seems somewhat specific to FreeRTOS since tud_task() blocks forever on FreeRTOS, making it impossible to do other work on this task. If the usbd_defer_func() function were exposed in a non-private header this seems like it might be one option, as this would make it possible to schedule a HID report to be sent from the tud_task() by sending an event to the TinyUSB task requesting it to run arbitrary code.

ztefanjo commented 3 months ago

We also see this crash (division by zero in dcd_edpt_xfer) if the device is removed while sending data to it. Any update on this bug @tore-espressif ?