embassy-rs / embassy

Modern embedded framework, using Rust and async.
https://embassy.dev
Apache License 2.0
5.35k stars 741 forks source link

STM32 DMA double-buffering #702

Open Dirbaio opened 2 years ago

Dirbaio commented 2 years ago

We want to add some form of first-class double-buffering support, to allow endless streaming of data.

Example use cases

Requirements

  1. Support double-buffering.
  2. Gap between transfers must be none (the latency of an irq or of a wake is too much.)
  3. There must not be UB even if irqs are delayed arbitrarily long (DMA must not wrap around and start overwriting the slice the user code is touching)

How to do this?

Satisfying the requirements is tricky. 3 essentially means we can't use DMA modes that "wrap around by default". For example, with circular buffer you might do this:

Start read onto a buffer in circular mode
Loop {
    Wait for HTIE, this means the 1st half is filled
    Hand the 1st half to the user, they process it
    Wait for TCIE, this means the 2nd half is filled
    Hand the 2nd half to the user, they process it
}}

However, if user takes too long to process the 1st half, DMA might wrap around and overwrite it from under them -> UB.

Unfortunately I believe it's "fundamentally impossible" to wrap DMA circular mode in a safe rust API :'(

The way we use DMA has to be something like "start writing to buf1, queue a write to buf2. When you're done with buf1 or buf2 tell me. but DO NOT wrap around back to buf1 until I tell you to do so", so if user code takes too long, DMA just stops (and maybe loses data) but there's no UB.

Idea 1: use M0AR/M1AR

There's some interesting ideas around on how to use M0AR/M1AR for this: writing a "poison" address to the next buffer (like 0xFFFF_FFFF) to get DMA to error and stop, then overwrite the poison with the real addr when it's safe to continue.

I'm not sure if this actually works in practice, or if it does it avoid UB in all cases. yes it does.

Disadvantages:

Idea 2: transfer queuing

it's not fast enough for some use cases (like DCMI)

Add a way in `trait Channel` to queue transfers. You start one transfer, queue the next. When a transfer finishes, the IRQ handler starts the next transfer if queued. DMA stops if there's no queued transfer. This allows code (e.g. the ADC hal) to: - Start transfer to buf1 - Queue transfer to buf2 - When buf1 is filled, hand it to user code, then queue it again - When buf2 is filled, hand it to user code, then queue it again - Repeat If user code is slow or IRQs are delayed, DMA loses data but there's no UB. Disadvantages: - Time gap is the irq latency, it's not zero.

Original discussion in Matrix

matoushybl commented 2 years ago

Further discussion revealed more information on double buffering with DMA/BDMA on different families and peripheral versions:

AntoineMugnier commented 2 years ago

from the discussion on Matrix (Formatted) :

Idea 1 - fast, sound, only F2, F4, F7, H7, L5 -: Preferred options if hw permits it.

Idea 2 - slow, sound, all chips Would be easier to implement/understand/maintain than 1, but the IRQ latency is not negligeable.
It should be fine for audio on I2S/DAC, i.e. at 180 MHz core and 48 kHz sampling, you would have ~4000 cpu cycles for the IRQ, which should be enough, assuming that there are no long critical sections and IRQ priority is high. But if you have some high-frequency ADC sampling application then it will be noticable. There's at least one usecase where that doesn't work at all: transfering pictures from DCMI

Idea 3 - fast, unsound on overrun, all chips Use DMA circular mode - single buffer. On overrun, panic or stop DMA from IRQ then make the task return with "OverrunError". The second option is technically still unsound because by the time the IRQ fires, overrun (and therefore UB) already has happened (or perhaps stop DMA return an error to the user, though that's a bit more risky) This would allow us to get streaming DMA ADC/whatever working on ALL chips and then maybe we can later on apply idea 1 for the chips that do support it.

AntoineMugnier commented 2 years ago

After the previous discussion, we have stated to implement at least idea 3 and 1, and maybe 2; Suggested ordering of the tasks for the development: Idea 3 => Idea 1 => Idea 2

I'm starting working on Idea 3