espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.29k stars 7.2k forks source link

Deep sleep entry: UART flush does not have a timeout (IDFGH-1331) #3620

Open tim-nordell-nimbelink opened 5 years ago

tim-nordell-nimbelink commented 5 years ago

Environment

Problem Description

The flush_uarts() function within components/esp32/sleep_modes.c does not have any kind of timeout mechanism. If you have a UART with flow control enabled, and the remote side is blocking UART traffic with flow control, the deep sleep code can hang indefinitely waiting for the UART to flush which will never flush.

Expected Behavior

That the deep sleep call would not hang indefinitely in this situation. One of the following could be implemented to adhere to this:

Actual Behavior

Device loops forever waiting for the UART to flush.

Steps to reproduce

  1. Setup a UART with flow control
  2. Assert the FLOW control line into the UART module (you could probably do this via the mux by attaching the input flow control line inside the UART to a mux 1 line)
  3. Send some data into the UART while the flow control line is asserted
  4. Invoke deep sleep

Discussion on methods for fixing

In our code I've mitigated this by turning off hardware flow control on all of the UARTs before entering deep sleep to ensure it doesn't get hung-up due to the other end of the UART, but it begs the higher level discussion on what the correct behavior in this case should be. I'm not sure which is better (between the three options I posed in the expected behavior above), and I'm not sure what an appropriate timeout would be. It feels like these are use-case dependent, so it feels to me that it'd be better to leave any flushing to the SDK user instead of assuming that the SDK user wants these flushed, however, that'd break expectations folks already have with the current SDK (and potentially several examples).

(A warning could be printed in the deep sleep entry code that there is pending UART data if the SDK didn't try to flush the data, but this, too, would need to be flushed so it's a catch-22.)

Related issues

koobest commented 5 years ago

Hi, @tim-nordell-nimbelink Can you help test the following fix? you need to change the API in esp32/rom/uart.h

static inline void IRAM_ATTR uart_tx_wait_idle(uint8_t uart_no) {
    uint32_t status;
    uint32_t flow_ctrl_mask = READ_PERI_REG(UART_CONF0_REG(uart_no)) & UART_TX_FLOW_EN_M;
    CLEAR_PERI_REG_MASK(UART_CONF0_REG(uart_no), flow_ctrl_mask);
    SET_PERI_REG_MASK(UART_FLOW_CONF_REG(uart_no), UART_FORCE_XON_M);
    do {
        status = READ_PERI_REG(UART_STATUS_REG(uart_no));
        /* either tx count or state is non-zero */
    } while ((status & (UART_ST_UTX_OUT_M | UART_TXFIFO_CNT_M)) != 0);
    SET_PERI_REG_MASK(UART_CONF0_REG(uart_no), flow_ctrl_mask);
}

If this fix is OK, we will apply it in the IDF release. thanks !!

tim-nordell-nimbelink commented 5 years ago

Hi @koobest -

I don't think universally disabling flow control is a good solution. It happens to be okay for our product, but I don't know if it'd be okay for all folks. It also wouldn't be appropriate to modify this ROM function to disable flow control - there are uses of it in the current SDK that occur before shutting down the main CPU and it would introduce a bug into the codebase inadvertently turning off flow control onto folks who are already using the function and expect flow control to remain enabled.

Here's a sample that I had coded with an eye toward a pull request, but I wasn't sure what approach would be acceptable to espressif (which is why I had opened a ticket instead of providing a patch):

static void IRAM_ATTR flush_uarts()
{
    int idle_status = 0;

    // Loop for up to ~100ms waiting for all UARTs to flush.
    for (int i = 0; i < 100; ++i)
    {
        // Update idle status for each UART
        for (int j = 0; j < UART_NUM_MAX; ++j)
        {
            if (!(idle_status & (1 << j)))
            {
                uint32_t status = READ_PERI_REG(UART_STATUS_REG(j));
                status &= ((UART_ST_UTX_OUT_M | UART_TXFIFO_CNT_M));
                idle_status |= (status == 0) << j;
            }
        }

        // If all 3 UARTs are idle, break
        if (idle_status == ((1 << UART_NUM_MAX) - 1))
        {
            break;
        }

        // Otherwise wait a little bit.
        ets_delay_us(1000);
    }
}

That approach would work with a maximum timeout of ~100ms. However, I'm not convinced that would satisfy all use cases, and certainly it wouldn't address current users of uart_tx_wait_idle(...) - it looks like there are 16 invocations in the core codebase (and it looks like many assume that CONFIG_CONSOLE_UART_NUM is valid which it isn't if CONFIG_CONSOLE_UART_NONE=y). Another option would be to add a second function which accepts a timeout parameter similar to uart_tx_wait_idle(...). However... "esp/rom/uart.h" seems like it's supposed to be only for replacing functions that are in the ROM code that are broken, not for introducing new ones...

-- Tim

igrr commented 5 years ago

I think this can be solved in a combination of two changes:

  1. On entry to deep sleep, don't wait for UART FIFO to be empty if flow control is enabled and UART state is not "transmitting".
  2. The behavior on entering deep sleep can be made run-time configurable. The default would be the same as now, to wait for UART FIFO to be empty (taking into account pt. 1). User could request to not wait for UART FIFO by calling a new function prior to entering deep sleep (something like esp_sleep_wait_for_uart(ESP_SLEEP_NOWAIT);). This will also address similar feature requests for faster entry into deep sleep, which we had before.
tim-nordell-nimbelink commented 5 years ago

Hi @igrr, @koobest -

I think it'd be okay to check if the UART state is "not transmitting", but I wasn't sure if the UART could go into that state between characters while it was running. If one is doing the UART idle state check (and it doesn't get into TX_IDLE regardless of the configuration of the UART such as someone setting UART_TX_IDLE_NUM to non-zero), then you probably wouldn't need to check to see if flow control is enabled and it could be a universal check across all users. The TRM wasn't clear on when those state transitions occur from that state register.

If the NOWAIT option isn't enabled by the user, I think it'd make sense to emit a warning on the console UART that it didn't flush UART X. Not all users will have a debugger attached to pinpoint where an issue is.

I think the esp_restart_noos() function probably should be changed, too, since it could run into the same issue with flow control upon someone issuing a restart call to the system. I think the other usages of uart_tx_wait_idle(...) are okay in the system but I haven't exhaustively checked.

-- Tim

tim-nordell-nimbelink commented 5 years ago

Hi @igrr, @koobest -

I did some testing with the following code:

    auto p = uart_port_t::UART_NUM_1;
    uart_set_hw_flow_ctrl(p, uart_hw_flowcontrol_t::UART_HW_FLOWCTRL_DISABLE, 0);
    uart_set_baudrate(p, 2400);
    uart_write_bytes(p, "123\r\n", 5);

    uint8_t transitions[32];
    uint8_t last_mode = 0;

    int spin = 100000;
    int i;
    for(i=0;i<sizeof(transitions);)
    {
        uint32_t new_mode = (READ_PERI_REG(UART_STATUS_REG(p)) & UART_ST_UTX_OUT_M) >> UART_ST_UTX_OUT_S;
        if(new_mode != last_mode)
        {
            last_mode = transitions[i++] = new_mode;
            spin = 100000;
        } else {
            if(--spin == 0)
            {
                break;
            }
        }
    }

    printf("%i transitions; current mode: %i\n", i,
                (READ_PERI_REG(UART_STATUS_REG(p)) & UART_ST_UTX_OUT_M) >> UART_ST_UTX_OUT_S);
    for(int j=0;j<i;++j)
    {
        printf("Transition[%i] = %i\n", j, transitions[j]);
    }

and I get something like the following for output:

32 transitions; current mode: 3
Transition[0] = 1
Transition[1] = 3
Transition[2] = 4
Transition[3] = 5
Transition[4] = 6
Transition[5] = 7
Transition[6] = 8
Transition[7] = 9
Transition[8] = 11
Transition[9] = 1
Transition[10] = 2
Transition[11] = 3
Transition[12] = 4
Transition[13] = 5
Transition[14] = 6
Transition[15] = 7
Transition[16] = 8
Transition[17] = 9
Transition[18] = 11
Transition[19] = 1
Transition[20] = 2
Transition[21] = 3
Transition[22] = 4
Transition[23] = 5
Transition[24] = 6
Transition[25] = 7
Transition[26] = 8
Transition[27] = 9
Transition[28] = 11
Transition[29] = 1
Transition[30] = 2
Transition[31] = 3

If flow control is enabled (and blocked continuously by the other side), I get:

0 transitions; current mode: 0

So, for the case of flushing all data possible without waiting on flow control, it seems like we could modify the check to only validate that the serial mode is 0. E.g. something like:

static void IRAM_ATTR flush_uarts()
{
    for (int i = 0; i < 3; ++i) {
        while (REG_GET_FIELD(UART_STATUS_REG(i), UART_ST_UTX_OUT) != 0) {
            ;
        }
}

which would look very similar to the suspend_uarts() function, but lacking the set bit call.

I still don't think it's a great solution for the general use-case (I think a good faith attempt with a timeout is better), but it's not bad. Some uC's that might be companions to the ESP32 only have 1-byte FIFOs (for instance, any in the STM32L4 series) and would hit the idle case even while it's flushing.

(The STM32L4 series will assert flow control briefly after every character received, so this patch would be incompatible with that series for instance, but the example I gave earlier with a timeout wouldn't be incompatible with that series of microcontrollers. Here's the diagram from the STM32L476 datasheet as an example where a purely idle state detection would be broken on the ESP32 side:

image

I don't know how many other microcontrollers behave this way that someone might choose to connect to the ESP32. We do have a product where the ESP32 is hooked up to a STM32L476 with flow control lines. I would hit this issue on that product though if it is implemented in the proposed manner, but not with the current implementation.)

-- Tim

koobest commented 5 years ago

Hi,@tim-nordell-nimbelink , Thank you for your feedback.

Since uart_tx_wait_idle(uint8_t uart_no) has parameters uart_no, can we use bitmaps to specify whether flow control needs to be turned off? @igrr

bit 6~bit0 -> UART number,
bit 7 -> controls whether to turn off the flow control.

This can also support interfaces such as esp_sleep_wait_for_uart(ESP_SLEEP_NOWAIT)