MaJerle / stm32-usart-uart-dma-rx-tx

STM32 examples for USART using DMA for efficient RX and TX transmission
MIT License
1.3k stars 322 forks source link
bluepill buff buffer circular data dma dma-mode dma-tc receive ring ringbuff stm32 usart

STM32 UART DMA RX and TX

This application note contains explanation with examples for 2 distinct topics:

Table of Contents

Github supports ToC by default. It is available in the top-left corner of this document.

Abbreviations

General about UART

STM32 has peripherals such as USART, UART or LPUART. Difference between them is not relevant for this purpose since concept can be applied to all of them. In few words, USART supports synchronous operation on top of asynchronous (UART) and LPUART supports Low-Power operation in STOP mode. When synchronous mode or low-power mode is not used, USART, UART and LPUART can be consideted identical. For complete set of details, check product's reference manual and datasheet.

For the sake of this application note, we will only use term UART.

UART in STM32 allows configurion using different transmit (TX) and receive (RX) modes:

This article focuses only on DMA mode for RX operation and explain how to handle unknown data length

Every STM32 has at least one (1) UART IP and at least one (1) DMA controller available in its DNA. This is all we need for successful data transmission. Application uses default features to implement very efficient transmit system using DMA.

While implementation happens to be pretty straight-forward for TX (set pointer to data, define its length and go) operation, this may not be the case for receive. Implementing DMA receive, application should understand number of received bytes to process by DMA before its considered done. However, UART protocol does not offer such information (it could work with higher-level protocol, but that's way another story that we don't touch here. We assume we have to implement very reliable low-level communication protocol).

Idle Line or Receiver Timeout events

STM32s have capability in UART to detect when RX line has not been active for period of time. This is achieved using 2 methods:

Both events can trigger an interrupt which is an essential feature to allow effective receive operation

Not all STM32 have IDLE LINE or RTO features available. When not available, examples concerning these features may not be used.

An example: To transmit 1 byte at 115200 bauds, it takes approximately (for easier estimation) ~100us; for 3 bytes it would be ~300us in total. IDLE line event triggers an interrupt when line has been in idle state for 1 frame time (in this case 100us), after third byte has been received.

IDLE LINE DEMO

This is a real experiment demo using STM32F4 and IDLE LINE event. After IDLE event is triggered, data are echoed back (loopback mode):

General about DMA

DMA in STM32 can be configured in normal or circular mode. For each mode, DMA requires number of elements to transfer before its events (half-transfer complete, transfer complete) are triggered.

While transfer is active, 2 (among others) interrupts may be triggered:

When DMA operates in circular mode, these interrupts are triggered periodically

Number of elements to transfer by DMA hardware must be written to relevant DMA register before start of transfer

Combine UART + DMA for data reception

Now it is time to understand which features to use to receive data with UART and DMA to offload CPU. As for the sake of this example, we use memory buffer array of 20 bytes. DMA will transfer data received from UART to this buffer.

Listed are steps to begin. Initial assumption is that UART has been initialized prior reaching this step, same for basic DMA setup, the rest:

This configuration is important as we do not know length in advance. Application needs to assume it may be endless number of bytes received, therefore DMA must be operational endlessly.

We have used 20 bytes long array for demonstration purposes. In real app this size may need to be increased. It all depends on UART baudrate (higher speed, more data may be received in fixed window) and how fast application can process the received data (either using interrupt notification, RTOS, or polling mode)

Combine UART + DMA for data transmission

Everything gets simplier when application transmits data, length of data is known in advance and memory to transmit is ready. For the sake of this example, we use memory for Helloworld message. In C language it would be:

const char
hello_world_arr[] = "HelloWorld";

Please note that TC event is triggered before last UART byte has been fully transmitted over UART. That's because TC event is part of DMA and not part of UART. It is triggered when DMA transfers all the bytes from point A to point B. That is, point A for DMA is memory, point B is UART data register. Now it is up to UART to clock out byte to GPIO pin

DMA HT/TC and UART IDLE combination details

This section describes 4 possible cases and one additional which explains why HT/TC events are necessary by application

DMA events

Abbrevations used for the image:

DMA configuration:

Possible cases during real-life execution:

Example code to read data from memory and process it, for cases A-D

/**
 * \brief           Check for new data received with DMA
 *
 * User must select context to call this function from:
 * - Only interrupts (DMA HT, DMA TC, UART IDLE) with same preemption priority level
 * - Only thread context (outside interrupts)
 *
 * If called from both context-es, exclusive access protection must be implemented
 * This mode is not advised as it usually means architecture design problems
 *
 * When IDLE interrupt is not present, application must rely only on thread context,
 * by manually calling function as quickly as possible, to make sure
 * data are read from raw buffer and processed.
 *
 * Not doing reads fast enough may cause DMA to overflow unread received bytes,
 * hence application will lost useful data.
 *
 * Solutions to this are:
 * - Improve architecture design to achieve faster reads
 * - Increase raw buffer size and allow DMA to write more data before this function is called
 */
void
usart_rx_check(void) {
    /*
     * Set old position variable as static.
     *
     * Linker should (with default C configuration) set this variable to `0`.
     * It is used to keep latest read start position,
     * transforming this function to not being reentrant or thread-safe
     */
    static size_t old_pos;
    size_t pos;

    /* Calculate current position in buffer and check for new data available */
    pos = ARRAY_LEN(usart_rx_dma_buffer) - LL_DMA_GetDataLength(DMA1, LL_DMA_CHANNEL_5);
    if (pos != old_pos) {                       /* Check change in received data */
        if (pos > old_pos) {                    /* Current position is over previous one */
            /*
             * Processing is done in "linear" mode.
             *
             * Application processing is fast with single data block,
             * length is simply calculated by subtracting pointers
             *
             * [   0   ]
             * [   1   ] <- old_pos |------------------------------------|
             * [   2   ]            |                                    |
             * [   3   ]            | Single block (len = pos - old_pos) |
             * [   4   ]            |                                    |
             * [   5   ]            |------------------------------------|
             * [   6   ] <- pos
             * [   7   ]
             * [ N - 1 ]
             */
            usart_process_data(&usart_rx_dma_buffer[old_pos], pos - old_pos);
        } else {
            /*
             * Processing is done in "overflow" mode..
             *
             * Application must process data twice,
             * since there are 2 linear memory blocks to handle
             *
             * [   0   ]            |---------------------------------|
             * [   1   ]            | Second block (len = pos)        |
             * [   2   ]            |---------------------------------|
             * [   3   ] <- pos
             * [   4   ] <- old_pos |---------------------------------|
             * [   5   ]            |                                 |
             * [   6   ]            | First block (len = N - old_pos) |
             * [   7   ]            |                                 |
             * [ N - 1 ]            |---------------------------------|
             */
            usart_process_data(&usart_rx_dma_buffer[old_pos], ARRAY_LEN(usart_rx_dma_buffer) - old_pos);
            if (pos > 0) {
                usart_process_data(&usart_rx_dma_buffer[0], pos);
            }
        }
        old_pos = pos;                          /* Save current position as old for next transfers */
    }
}

Interrupt priorities are important

Thanks to Cortex-M NVIC's (Nested Vectored Interrupt Controller) flexibility, user can configure priority level for each of the NVIC interrupt lines; it has full control over execution profile for each of the interrupt lines separately.

There are 2 priority types in Cortex-M:

STM32s have different interrupt lines (interrupt service routines later too) for DMA and UART, one for each peripheral and its priority could be software configurable.

Function that gets called to process received data must keep position of last read value, hence processing function is not thread-safe or reentrant and requires special attention.

Application must assure, DMA and UART interrupts utilize same preemption priority level. This is the only configuration to guarantee processing function never gets preempted by itself (DMA interrupt to preempty UART, or opposite), otherwise last-known read position may get corrupted and application will operate with wrong data.

Examples

Examples can be used as reference code to implement your own DMA TX and RX functionality.

There are 2 sets of examples:

Common for all examples:

STM32 family Board name USART STM32 TX STM32 RX RX DMA settings TX DMA settings
STM32F1xx BluePill-F103C8 USART1 PA9 PA10 DMA1, Channel 5
STM32F4xx NUCLEO-F413ZH USART3 PD8 PD9 DMA1, Stream 1, Channel 4 DMA1, Stream 3, Channel 4
STM32G0xx NUCLEO-G071RB USART2 PA2 PA3 DMA1, Channel 1
STM32G4xx NUCLEO-G474RE LPUART1 PA2 PA3 DMA1, Channel 1
STM32L4xx NUCLEO-L432KC USART2 PA2 PA15 DMA1, Channel 6, Request 2
STM32H7xx NUCLEO-H743ZI2* USART3 PD8 PD9 DMA1, Stream 0 DMA1, Stream 1
STM32U5xx NUCLEO-U575ZI-Q* USART1 PA9 PA10 GPDMA1, Channel 0 GPDMA1, Channel 1
  • It is possible to run H743 (single-core) examples on dual-core STM32H7 Nucleo boards, NUCLEO-H745 or NUCLEO-H755. Special care needs to be taken as dual-core H7 Nucleo boards use DCDC for MCU power hence application must check clock configuration in main file and uncomment code to enable SMPS.

Examples demonstrate different use cases for RX only or RX&TX combined.

Demos part of this repository are all based on Low-Level (LL) drivers to maximize user understanding - how to convert theory into practice. Some STM32Cube firmware packages include same example using HAL drivers too. Some of them are (with link to example; list is not exhausted) listed below. All examples are identified as UART_ReceptionToIdle_CircularDMA - you can search for it in your local Cube firmware repository.

Examples for UART + DMA RX

Polling for changes

Polling for changes with operating system

UART IDLE line detection + DMA HT&TC interrupts

Processing of incoming data is from 2 interrupt vectors, hence it is important that they do not preempt each-other. Set both to the same preemption priority!

USART Idle line detection + DMA HT&TC interrupts with RTOS

This is the most preferred way to use and process UART received character

Examples for UART DMA for TX (and optionally included RX)

Demo application for debug messages

This is a demo application available in projects folder. Its purpose is to show how can application implement output of debug messages without drastically affect CPU performance. It is using DMA to transfer data (no CPU to wait for UART flags) and can achieve very high or very low data rates

As a result of this demo application for STM32F413-Nucleo board, observations are as following:

How to use this repository

  1. run git clone --recurse-submodules https://github.com/MaJerle/stm32-usart-dma-rx-tx to clone repository including submodules
  2. run examples from projects directory using STM32CubeIDE IDE