esp-rs / esp-hal

no_std Hardware Abstraction Layers for ESP32 microcontrollers
https://docs.esp-rs.org/esp-hal/
Apache License 2.0
721 stars 195 forks source link

DMA Queuing API #1785

Closed MabezDev closed 1 month ago

MabezDev commented 3 months ago

The discussion we had about the queuing API outgrew the original issue, so I'm moving it here to continue the discussion. See the full history in https://github.com/esp-rs/esp-hal/issues/1512


Just some thoughts here and there about the topic. Apologies for the wall of text, I've been thinking about this for 2 days.

I think the current ergonomic (but technically unsound) API can be built on top of a move based API, it just means we need to live with some memcpy's (and thanks to #1255 it should be running at top speed). The idea being we have some 'static buffer that the user provides, (probably in a c'tor?) then the driver can use this for it's actual transfers and memcpy the results into the stack allocated buffer that the user provides to write, read, transfer, etc. For particularly large transfers, it doesn't have to be one big memcpy which results in the total latency being "time to do SPI transfer" + "time to memcpy it all", it could be a series of memcpys that follows the progress of the transfer. So if the transfer is to read 10,000 bytes, a waker can be woken every 1000 bytes (in reality it'll probably be once per DMA descriptor) so the Future::poll can memcpy next available 1000 bytes into the stack borrowed buffer. The total latency the becomes "time to do SPI transfer" + "time to copy last chunk", which is IMO acceptable for ergo and soundness. Latency sensitive applications can take matters into their own hands (using the move based API directly) if they want to skip the memcpys.

I would really love to be able to queue multiple (DMA or not) transactions like esp-idf allows you to (but slightly better). The SPI peripheral on the ESP32S3 (and all of the others I think) support multiple CS pins (CS0, CS1, CS2, etc). It would be really fantastic if I could have a shared SPI bus situation where each device on the bus could queue up transactions for the Spi driver that it'll gradually execute using the interrupt to schedule the next transfers, then wake the application's Wakers to consume the result of each transaction in it's own time. The important bit here is the fact that the transactions keep getting executed in the background even if the application doesn't Future::poll often, which can happen if the executor is too busy, maybe with a CPU bound task, it's just running in debug mode, a task with a big tree of futures or there are simply too many tasks to poll. ESP-IDF allows you to do this queueing for a single device on the bus but queueing for a different device forces you to drain the queue beforehand... which is not ideal. I did the wrapper for this in esp-idf-hal which is were I discovered the limitations. The SPI driver docs currently say to use embedded-hal-bus for bus sharing, which works, but it means each device has to wait for the other devices to finish transacting before even getting to schedule it's transfer. My immediate use case for this is only one device, which is a great starting point haha, but having multiple would be great! (Once I maximise single device throughput I'll be setting up multiple MAX3421Es on the same SPI bus to have multiple USB ports, so I'll have a use case for that too)

I don't really know if all of this belongs in the hal, with a move based API, users could just DIY it on top of the driver. Or perhaps it's useful enough for everyone to keep in here, idk. Can decide this later I suppose.

So I think to achieve that, we probably want to split out the "preparation" part from the actual DMA transaction.

It's important that one is able to queue up more transactions whilst there are already transactions in flight. Splitting out the "preparation" part from the transaction itself would mean you can queue multiple transactions at once but once they start you have to wait for them to finish to queue some more, killing throughput.

To achieve this, I think we just need a 'static enum representing the state (idle vs transfer in progress) of the Spi driver and an interrupt to read off a 'static shared queue of transactions and manage the state transitions between idle/busy.

and this may be impossible to achieve

If ESP-IDF can do it, then esp-hal can do it too. 😛

A more interesting implementation could also take advantage of the segmented transfer feature that the ESP32S3 has (Idk about other chips). It basically let's you queue up multiple transactions to different devices in one go. It appears to use GDMA to configure the SPI registers.

Originally posted by @Dominaezzz in https://github.com/esp-rs/esp-hal/issues/1512#issuecomment-2091264426

MabezDev commented 1 month ago

Closed via #1856