embassy-rs / embassy

Modern embedded framework, using Rust and async.
https://embassy.dev
Apache License 2.0
5.28k stars 729 forks source link

embassy-usb does not poll the bus during control transfers #2751

Closed simpkins closed 5 months ago

simpkins commented 5 months ago

I'm playing around with writing a embassy-usb-driver implementation for esp32 chips. I'm doing it slightly differently than the embassy-stm32 crate (even though esp32 also uses a similar synopsys USB core).

I was (am?) hoping to have almost everything driven by Bus::poll(). There is only a single USB interrupt, and I was hoping avoid modifying any device state in the interrupt handler, and simply having it wake the bus poller to that device state changes are only done in the main executor thread. However, this doesn't appear to work, and it looks like this is because UsbDevice::run_until_suspend() stops polling the bus while it is handling a ControlPipe event. In my case, this means the control handler gets stuck and never makes progress if Bus::poll() needs to be called to drive the control transfer forwards.

It seems undesirable to stop polling the bus for periods of time? It would seem to me like we would still want to process any bus events that occur even while a control endpoint transfer is in progress.

The embassy-stm32 appears to work around the fact that Bus::poll() doesn't get called by having the interrupt handler check each endpoint for events, and individually wake each endpoint instead of just waking the bus. I was ideally hoping to avoid this, since this means we need to modify the device state while in the interrupt handler, and would need more careful use of critical sections to synchronize the interrupt handler's work with work done on the embassy executor.

The main challenge with continually polling the bus appears to be the fact that UsbDevice::handle_control() needs a mutable reference to the UsbDevice for the duration of it's operation, so it cannot simultaneously poll the bus while this is ongoing.

simpkins commented 5 months ago

In the short term, I think I can work around this in the driver by checking for interrupts to service both in Bus::poll() and in any method invoked on Endpoint 0. This should work, but is slightly more awkward and makes the driver a bit more aware of how the higher level code works than I would like.

I do have a proposal for how this issue could be addressed, and I think it would fix some other issues that currently require drivers make more use of unsafe code than necessary:

It would be nice to add some new class to store shared state needed by the driver. Both the Bus and ControlPipe need to access some shared state (e.g., the underlying USB peripheral, and any other data the driver needs), but currently embassy-usb provides no place to to store this. This forces drivers to store it as a global object with static lifetime, and use unsafe blocks to access it. e.g., in the embassy-stm32 driver they have the global T::state() object, and use unsafe blocks to mutably access its members like state.ep0_setup_data.

It would be nice to actually add some sort of State class to the API, so that drivers had a legitimate place to store this data and could let Rust actually verify safeness of accesses. The Bus and ControlPipe object can then both have a references to this state object (I would give them some thing like &RefCell<State::Inner> so that they can both reference the state and use RefCell::borrow_mut() when they need to access it). Higher level user code could still store the State object statically on the stack if they desire (particularly since it would be immovable for the lifetime of the Bus and ControlPipe), but it's then up to them to decide if they want to do so.

Having this State object also then provides a good place for the embassy-driver-usb to do interrupt servicing. It could have a State::run() method that you call and which just runs forever processing interrupts, waking up ControlPipe and Bus as necessary. This would let a lot of work be done in a normal embassy task, rather than inside the interrupt handler, since the interrupt handler requires manual use of critical sections rather than letting us use normal Rust mechanisms to ensure safety.

Dirbaio commented 5 months ago

I'm playing around with writing a embassy-usb-driver implementation for esp32 chips. I'm doing it slightly differently than the embassy-stm32 crate (even though esp32 also uses a similar synopsys USB core).

it uses the exact same core, in fact there's talks of extracting the stm32 driver so it can be reused for esp32. you might be reinventing the wheel :)

I was (am?) hoping to have almost everything driven by Bus::poll(). There is only a single USB interrupt, and I was hoping avoid modifying any device state in the interrupt handler, and simply having it wake the bus poller to that device state changes are only done in the main executor thread (...)

iirc there were issues with this approach. There's a single FIFO for all RX'd packets, which causes problems because you can't control the order you read things from the hardware. For example, if there's 2 tasks each reading on EP0 and EP1, and the EP0 task gets polled, it can't simply go and read the packet from the hardware because it might get the EP1 packet and it has nowhere to put it.

Therefore no matter what you do you still need some synchronization between tasks. Let's say we make poll() be always called and make it handle the packets for all endpoints. Then reading a packet from EP1 would be 2 context switches (irq -> usb task doing poll() -> user task doing read() on ep1), vs now it's just 1 (irq -> user task doing read() on ep1) which is faster.

It would be nice to add some new class to store shared state needed by the driver. Both the Bus and ControlPipe need to access some shared state (e.g., the underlying USB peripheral, and any other data the driver needs), but currently embassy-usb provides no place to to store this. This forces drivers to store it as a global object with static lifetime, and use unsafe blocks to access it. e.g., in the embassy-stm32 driver they have the global T::state() object, and use unsafe blocks to mutably access its members like state.ep0_setup_data.

embassy-stm32 uses globals because it decided to. Drivers can decide to do what you suggest instead, with no collaboration needed from embassy-usb. Define a State struct, take a &State on new(), share it to all the driver parts (bus, controlpipe, endpoints...).

simpkins commented 5 months ago

it uses the exact same core, in fact there's talks of https://github.com/embassy-rs/embassy/issues/2494 so it can be reused for esp32. you might be reinventing the wheel :)

Yeah, I realize that I'm probably doing some redundant work. I was doing this in part since I had already written a custom async-ish C++ driver for esp32 in the past, and was interested in porting that code to Rust, partly as a way to learn a bit more about embassy as I go.

I also have been attempting to do this using the esp-hal, and it's API for accessing the USB registers is pretty different than the STM32 APIs. That said, it turns out that the USB register blocks in the esp-pacs crate has a number of bugs, and trying to port my driver code has at least helped expose several bugs in the SVD files. I've submitted some PRs to the esp-pacs crate, and they did also point me to the issue you linked above. Here was the discussion in esp-pacs: https://github.com/esp-rs/esp-pacs/issues/212

I wasn't really that familiar at all with the esp-pacs stuff before I started this (or the STM ral code). After having played around with it some now I do think it indeed might be a reasonable approach on esp32 to have the user pass in the esp-hal USB0 peripheral for borrow-checking purposes, but then just ignore it and access the USB registers directly through whatever other API we want.

iirc there were issues with this approach. There's a single FIFO for all RX'd packets, which causes problems because you can't control the order you read things from the hardware. For example, if there's 2 tasks each reading on EP0 and EP1, and the EP0 task gets polled, it can't simply go and read the packet from the hardware because it might get the EP1 packet and it has nowhere to put it.

This is pretty much precisely why I want just a single main "State" task to service all interrupts: when an interrupt occurs you don't really know what endpoint it is for, and you have to do some processing (and state access) first just to decide if it is for a specific endpoint. My intention is to have the USB interrupt handler just wake 1 main interrupt servicing task. This tasks can service all interrupts when it wakes, and then wake individual endpoint tasks (or the bus event handler task) as necessary.

Therefore no matter what you do you still need some synchronization between tasks. Let's say we make poll() be always called and make it handle the packets for all endpoints. Then reading a packet from EP1 would be 2 context switches (irq -> usb task doing poll() -> user task doing read() on ep1), vs now it's just 1 (irq -> user task doing read() on ep1) which is faster

Yeah, I don't have things working enough yet to have done any benchmarking. If the task switching overhead is high you may be right that this isn't a worthwhile trade-off.

The main thing I was hoping to avoid here is to avoid needing any manual critical sections around data access, and just letting the Rust compiler do it's normal thing to validate correctness. Even between separate tasks I believe I shouldn't need any mutexes or critical sections. The only thing I'm using is just a RefCell::borrow_mut() to guard access to the central state object, and even then this mainly seems like just a runtime soundness check that in theory potentially could be optimized away if desired.

Having to service the RX FIFO inside the interrupt handler just seems to make synchronization somewhat trickier to get right, and we can't really rely on the rust compiler to validate correct access for us then.

embassy-stm32 uses globals because it decided to. Drivers can decide to do what you suggest instead, with no collaboration needed from embassy-usb. Define a State struct, take a &State on new(), share it to all the driver parts (bus, controlpipe, endpoints...).

Ah, that's a good point. Yes, that should work to have callers declare the State object for me and pass it in to Driver::new(). Thanks!

Dirbaio commented 5 months ago

The main thing I was hoping to avoid here is to avoid needing any manual critical sections around data access, and just letting the Rust compiler do it's normal thing to validate correctness. Even between separate tasks I believe I shouldn't need any mutexes or critical sections. The only thing I'm using is just a RefCell::borrow_mut() to guard access to the central state object

that won't work if usb task are at different priorities. Note embassy_stm32::usb::Endpoint is Send, it won't be if it contains a &State that contains a RefCell.

simpkins commented 5 months ago

that won't work if usb task are at different priorities. Note embassy_stm32::usb::Endpoint is Send, it won't be if it contains a &State that contains a RefCell.

Yeah, I had only been planning to support using the USB objects within a single Executor, and require users do their own message passing between tasks if they want to interact with the USB device from other Executors. (That is basically how my C++ implementation works: all USB work is done a single FreeRTOS task that generally does non-blocking work around an xTaskNotifyWait() loop, and if users want to interact with the USB device from other tasks they need to send messages to it.)

I suppose it would be possible to have a cfg option to control whether endpoints are Send or not, and to use a critical section only if this option is enabled. On esp32s2/s3 I believe critical sections use a spinlock to protect multicore access. It seems like it would be nice if we could have a spinlock just for the USB system rather than blocking work on the other core that uses a critical section but isn't related to USB.

Anyway, I'm continuing to poke around getting my code to work just since I am pretty close to having it working, but I may take a stab at getting the embassy-stm32/usb/otg.rs code to work and seeing if I can extract that out so the code can be shared rather than having 2 separate implementations. At the moment I don't have an STM32 device to test with to make sure I haven't broken the stm32 implementation, but I'll try to get one.

simpkins commented 5 months ago

Also, while debugging my implementation I did notice on thing that seems like it might be a minor bug in the embassy-stm32 implementation?

If the bus is reset while a call to EndpointOut::read() is in progress for endpoint 0, does the EndpointOut::read() call ever get woken up anywhere so it can fail? I don't have an STM32 to device to test on at the moment, but looking at the code I didn't see anything that wakes the ep_in_wakers on reset, and even if it was woken up, the code just checks doepctl.usbaep, which is always enabled for endpoint 0, even after a reset. If EndpointOut::read() never returns it seems like the code will never call Bus::poll() again to receive the reset event.

It's quite possible that this code path isn't hit in most normal situations if hosts don't normally reset the bus in the middle of a transaction, and for bus powered devices it doesn't matter if someone unplugs the power in the middle of a transaction since the device will lose power anyway.

I happened to run into this in my testing since when first plugged in to a Linux host Linux will request the device descriptor once, then immediately reset the bus and request the descriptor again. After that first device descriptor response, I tended to see the OUT status acknowledgement packet together in the same interrupt handler execution with the reset. I was processing the reset event first (and flushing the RX FIFO and dropping the transaction acknowledgement), which meant that I needed to fix my EndpointOut::read() implementation to get woken up and fail correctly in this case if the bus was reset while a read was pending. As best as I can tell from looking at the code, the embassy-stm32 read() implementation doesn't appear to get woken up in this case, but it looks like it happens to work since it processes the OUT status from the RX FIFO first before processing the reset.

Dominaezzz commented 5 months ago

Yeah, I realize that I'm probably doing some redundant work.

I haven't made much progress on this stuff anyway. I very quickly got overwhelmed and shelved the project. Too many questions to answer, too much to learn at once and too much behind NDA, so I'm quite happy to see someone else is making this a reality.

Ah, that's a good point. Yes, that should work to have callers declare the State object for me and pass it in to Driver::new(). Thanks!

I did this as well when I was initially trying to wrap usb-device. Let me know if hit any more Rusty issues like this, I may have already solved them :).

Looks like this issue can be closed now though.

Dirbaio commented 5 months ago

i'm going to close this as "working as intended". poll() is intended for waiting for bus status changes, not for general driver background work. Drivers should either do work in interrupts (ideally) or spawn a background task (extra work for the user, more overhead).