Option for statically allocated DMA buffer(s) for waveform generation in circular mode

clemenseder commented 1 month ago

Currently there are a larger number of buffers constantly allocated and de-allocated whenever the waveform is updated. This could be necessary for playing back a waveform, but this becomes very wasteful when dealing with simple periodic waveform generation (such as for a sine/cosine/triangle, etc.). For that only two fixed DMA buffers would suffice; one could switch the pointers between one or the other buffers for example when half of the buffer is reached, using an interrupt. The user does not even have to actively re-set the pointer in the interrupt service routine (not like for the Arduino Due)

// USER CODE BEGIN 1 
void HAL_DAC_ConvHalfCpltCallbackCh1(DAC_HandleTypeDef* hdac)
{
    ;//HAL_GPIO_TogglePin(GPIOB, GPIO_PIN_7);
}

void HAL_DAC_ConvCpltCallbackCh1(DAC_HandleTypeDef* hdac)
{
    g_dac_done = true; //global variable set, not necessary.
}
// USER CODE END 1

I think that the key for making this work is enabling the circular mode (see below). This works well in the example code run on a STM32H747I-DISCO platform, but it would be nice if the Arduino_AdvancedAnalog API could be extended, allowing the user to start circular waveform generation. Once started, the MCU would not need to be engaged any longer. All the user would need to do is to declare and fill a buffer with a defined waveform and start a dedicated .begin method (or circular = true argument or similar).

I kindly ask you to consider this feature for implementation, or at least to start a discussion. I believe that such feature would be beneficial for many users.

void HAL_DAC_MspInit(DAC_HandleTypeDef* dacHandle)
{

  GPIO_InitTypeDef GPIO_InitStruct = {0};
  if(dacHandle->Instance==DAC1)
  {
  /* USER CODE BEGIN DAC1_MspInit 0 */

  /* USER CODE END DAC1_MspInit 0 */
    /* DAC1 clock enable */
    __HAL_RCC_DAC12_CLK_ENABLE();

    __HAL_RCC_GPIOA_CLK_ENABLE();
    /**DAC1 GPIO Configuration
    PA5     ------> DAC1_OUT2
    PA4     ------> DAC1_OUT1
    */
    GPIO_InitStruct.Pin = DACch2_Pin|DACch1_Pin;
    GPIO_InitStruct.Mode = GPIO_MODE_ANALOG;
    GPIO_InitStruct.Pull = GPIO_NOPULL;
    HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);

    /* DAC1 DMA Init */
    /* DAC1_CH1 Init */
    hdma_dac1_ch1.Instance = DMA1_Stream0;
    hdma_dac1_ch1.Init.Request = DMA_REQUEST_DAC1;
    hdma_dac1_ch1.Init.Direction = DMA_MEMORY_TO_PERIPH;
    hdma_dac1_ch1.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_dac1_ch1.Init.MemInc = DMA_MINC_ENABLE;
    hdma_dac1_ch1.Init.PeriphDataAlignment = DMA_PDATAALIGN_HALFWORD;
    hdma_dac1_ch1.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;
    hdma_dac1_ch1.Init.Mode = DMA_CIRCULAR;
    hdma_dac1_ch1.Init.Priority = DMA_PRIORITY_LOW;
    hdma_dac1_ch1.Init.FIFOMode = DMA_FIFOMODE_ENABLE;
    hdma_dac1_ch1.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_FULL;
    hdma_dac1_ch1.Init.MemBurst = DMA_MBURST_SINGLE;
    hdma_dac1_ch1.Init.PeriphBurst = DMA_PBURST_SINGLE;
    if (HAL_DMA_Init(&hdma_dac1_ch1) != HAL_OK)
    {
      Error_Handler();
    }

    __HAL_LINKDMA(dacHandle,DMA_Handle1,hdma_dac1_ch1);

    /* DAC1_CH2 Init */
    hdma_dac1_ch2.Instance = DMA1_Stream2;
    hdma_dac1_ch2.Init.Request = DMA_REQUEST_DAC2;
    hdma_dac1_ch2.Init.Direction = DMA_MEMORY_TO_PERIPH;
    hdma_dac1_ch2.Init.PeriphInc = DMA_PINC_DISABLE;
    hdma_dac1_ch2.Init.MemInc = DMA_MINC_ENABLE;
    hdma_dac1_ch2.Init.PeriphDataAlignment = DMA_PDATAALIGN_HALFWORD;
    hdma_dac1_ch2.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;
    hdma_dac1_ch2.Init.Mode = DMA_CIRCULAR;
    hdma_dac1_ch2.Init.Priority = DMA_PRIORITY_LOW;
    hdma_dac1_ch2.Init.FIFOMode = DMA_FIFOMODE_ENABLE;
    hdma_dac1_ch2.Init.FIFOThreshold = DMA_FIFO_THRESHOLD_FULL;
    hdma_dac1_ch2.Init.MemBurst = DMA_MBURST_SINGLE;
    hdma_dac1_ch2.Init.PeriphBurst = DMA_PBURST_SINGLE;
    if (HAL_DMA_Init(&hdma_dac1_ch2) != HAL_OK)
    {
      Error_Handler();
    }

    __HAL_LINKDMA(dacHandle,DMA_Handle2,hdma_dac1_ch2);

  /* USER CODE BEGIN DAC1_MspInit 1 */

  /* USER CODE END DAC1_MspInit 1 */
  }
}

iabdalkader commented 1 month ago

Currently there are a larger number of buffers constantly allocated and de-allocated whenever the waveform is updated.

This is not true. The library allocates buffers from a special, contiguous DMA buffer pool, which can also be initialized statically. There's no constant allocation or de-allocation involved. When buffers get released, they return to the pool. And the library already uses double buffering modes in all drivers, and takes care of swapping buffers for the user code, cache maintenance etc... The user code only has to follow the examples and use the high-level API.

clemenseder commented 1 month ago

Thank you for your quick response.

I do not know the exact ins and outs of the implementation, I just cannot get my head around why one needs a pool of DMA buffers in the first place, and not just two. But it seems to be the case that one constantly has to poll for a new buffer from the queue to become available and to write the same data to it again and again. That seems to be quite inefficient, possibly wasting some processor power (The STM32H747 is quite fast, but still what is then the point of having a DMA in the first place?). At least that is the only method that is given in the examples from the high-level API documentation - see below.

So all I am suggesting is that the user should be able to setup a single or two buffers, run the DAC and not having to check whether the DAC is available any longer. Wouldn't that be nice (from a user perspective)? If that is already possible then a hint on how to set this up would be highly appreciated.

if (dac1.available()) {
  // Get a free buffer for writing
  SampleBuffer buf = dac1.dequeue();

  // Write data to buffer (Even position: 0, Odd position: 0xFFF)
  for (int i = 0; i < buf.size(); i++) {
    buf.data()[i] = (i % 2 == 0) ? 0 : 0xFFF;
  }

  // Write the buffer to DAC
  dac1.write(buf);
}

iabdalkader commented 1 month ago

I just cannot get my head around why one needs a pool of DMA buffers in the first place, and not just two.

If you can process the buffers fast enough then two buffers would work, but that's not always the case, especially if you do some processing in the sketch. The pool lets you create variable-sized FIFOs of DMA buffers so that the DMA never underruns/overruns. And more importantly, the pool takes care of the pitfalls and details that the user would have to implement otherwise. Such as aligning buffers to cache-lines, cache maintenance, etc.. Some of this may also need to be portable across different platforms (for example, the size of the cache line, if there's any cache).

That all said, you could always allocate a small pool of just 2 or 3 buffers (the minimum number for output to start). The overhead is very small. Also with the latest update in core-API, the pool can be initialized from a static memory block. It will still do some dynamic allocation on start, for the buffers wrappers, but this lets you use a block of RAM for the pool's memory that might not be accessible by malloc.

But it seems to be the case that one constantly has to poll for a new buffer from the queue to become available and to write the same data to it again and again. That seems to be quite inefficient

Yes if the data doesn't change it has to be written over and over, but again that's just one use case. Normally when you use this library you have generate or consume data on the fly.

Wouldn't that be nice (from a user perspective)? If that is already possible then a hint on how to set this up would be highly appreciated.

I didn't test this before, but you can write all the buffers in the pool initially in setup() for example, then in loop() you can skip the re-writing, for example:

dac1.write(dac1.dequeue());

Note the data won't be changed, buffers don't get reset when they released they just return to the queue of free buffers. We could look into making this more standard, for example: fill buffers -> run forever.

clemenseder commented 1 month ago

That sounds very interesting! I am not sure though whether the user can get a direct handle to all of these buffers, once created. Are these globablly defined? Or would it be the case of copying the buffer during a manual setup stage n (number of buffer) times, after which the user would simply call dac1.write(dac1.dequeue) as suggested?

A function like fill buffers would be really fantastic! Would there be also a possibility to attach a callback function that could be called whenever the DAC would be ready to dequeue the next buffer? But I guess that I may know already the answer to that as the producer - consumer design pattern would probably not lend itself to such approach...

iabdalkader commented 1 month ago

I am not sure though whether the user can get a direct handle to all of these buffers, once created. Are these globablly defined?

The following should work. Note that you need exactly 3 buffers, the DAC starts after you write 3 buffers:

// This example outputs an 8KHz square wave on A12/DAC0.
#include <Arduino_AdvancedAnalog.h>

AdvancedDAC dac1(A12);

void setup() {
    Serial.begin(9600);
    while (!Serial) {
    }

    if (!dac1.begin(AN_RESOLUTION_12, 16000, 32, 3)) {
        Serial.println("Failed to start DAC1 !");
        while (1);
    }

    for (int b=0; b<3; b++) {
        // Get a free buffer for writing.
        SampleBuffer buf = dac1.dequeue();

        // Write data to buffers.
        for (int i=0; i<buf.size(); i++) {
            buf.data()[i] =  (i % 2 == 0) ? 0: 0xfff;
        }

        dac1.write(buf);
    }
}

void loop() {
    if (dac1.available()) {
        dac1.write(dac1.dequeue());
    }
}

clemenseder commented 1 month ago

Wow, that is really great! I have successfully tested your example on a Giga. This is a much better solution, and probaly would be ideal if the number of samples was large and if the dac.available() check could run in its own thread. Please feel free to close this issue as your directions were extremely helpful. But I would appreciate if a more automated feature (a loop parameter for auto-filling buffers, callbacks etc.) could be incorporated in the future. In the meantime I have posted your solution on the Arduino Forum where I had discussed this issue earlier.

iabdalkader commented 1 month ago

and probaly would be ideal if the number of samples was large and if the dac.available()

You can set any number of samples you want (within reason of course, the total pool size can't exceed the available memory).

But I would appreciate if a more automated feature (a loop parameter for auto-filling buffers, callbacks etc.)

I have a work in progress loop mode here https://github.com/arduino-libraries/Arduino_AdvancedAnalog/pull/75

This allows you to have as many buffers as you want, you fill all buffers in setup(), once all buffers are written DAC starts and it will automatically cycle through all buffers. Like playing back a signal in a loop. However note you won't be able to use this before the next mbed core release and the next library release.

#include <Arduino_AdvancedAnalog.h>

AdvancedDAC dac1(A12);

void setup() {
    Serial.begin(9600);

    while (!Serial) {

    }

    // Start DAC in loop mode.
    if (!dac1.begin(AN_RESOLUTION_12, 16000, 32, 16, true)) {
        Serial.println("Failed to start DAC1 !");
        while (1);
    }

    // In loop mode, the DAC will start automatically after all
    // buffers are filled, and continuously cycle through over all buffers.
    uint16_t sample = 0;
    while (dac1.available()) {
        // Get a free buffer for writing.
        SampleBuffer buf = dac1.dequeue();

        // Write data to buffer.
        for (int i=0; i<buf.size(); i++) {
            buf.data()[i] = sample;
        }       

        // Write the buffer to DAC.
        dac1.write(buf);
        sample += 256;
    }
}

void loop() {
  // In loop mode, no other DAC functions need to be called.
}

SDS00002

clemenseder commented 1 month ago

I understand that the implementation will take some time, but I am not in a hurry as you have shown me a good workaraound. I am still assessing the Arduino as a possible platform for the STM32H747. I had very bad experiences prior due to unrecoverable MCUs (and the bootloader via VCP only seems to work on the ST Link v3, only on the internal module on the discovery board). But the Arduino Giga has a direct DFU connection, which is convenient and works well.

I have earlier worked with TouchGFX, but there should be good lvgl tools around that could integrated nicely with the Arduino. I may contact those developers when building up a GUI.

Anyhow, I really appreciate your help in this and that your decision to work on this new feature!

clemenseder commented 1 week ago

Thanks, fantastic job and thank you for taking in my suggestion!

iabdalkader commented 1 week ago

Thanks, fantastic job and thank you for taking in my suggestion!

You're welcome!

arduino-libraries / Arduino_AdvancedAnalog

Option for statically allocated DMA buffer(s) for waveform generation in circular mode #74