espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.89k stars 7.32k forks source link

[IDF v5.0] uart_wait_tx_done() hangs forever with portMAX_DELAY (IDFGH-9184) #10577

Open DirkGmailMeintjies opened 1 year ago

DirkGmailMeintjies commented 1 year ago

I believe this function to have introduced a new bug in IDF 5.0. When called with ticks_to_wait as portMAX_DELAY it is possible for a TX DONE interrupt to be missed (cleared accidentally) then the code waits forever for the tx_done_sem. Here's ESP's function:

_ esp_err_t uart_wait_tx_done(uart_port_t uart_num, TickType_t ticks_to_wait) { ESP_RETURN_ON_FALSE((uart_num < UART_NUM_MAX), ESP_FAIL, UART_TAG, "uart_num error"); ESP_RETURN_ON_FALSE((p_uart_obj[uart_num]), ESP_FAIL, UART_TAG, "uart driver error"); BaseType_t res; TickType_t ticks_start = xTaskGetTickCount(); //Take tx_mux res = xSemaphoreTake(p_uart_obj[uart_num]->tx_mux, (TickType_t)ticks_to_wait); if (res == pdFALSE) { return ESP_ERR_TIMEOUT; } xSemaphoreTake(p_uart_obj[uart_num]->tx_done_sem, 0); if (uart_hal_is_tx_idle(&(uart_context[uart_num].hal))) { xSemaphoreGive(p_uart_obj[uart_num]->tx_mux); return ESP_OK; } if (!UART_IS_MODE_SET(uart_num, UART_MODE_RS485_HALF_DUPLEX)) { uart_hal_clr_intsts_mask(&(uart_context[uart_num].hal), UART_INTR_TX_DONE); } UART_ENTER_CRITICAL(&(uart_context[uart_num].spinlock)); uart_hal_ena_intr_mask(&(uart_context[uart_num].hal), UART_INTR_TX_DONE); UART_EXIT_CRITICAL(&(uart_context[uart_num].spinlock));

TickType_t ticks_end = xTaskGetTickCount();
if (ticks_end - ticks_start > ticks_to_wait) {
    ticks_to_wait = 0;
} else {
    ticks_to_wait = ticks_to_wait - (ticks_end - ticks_start);
}
//take 2nd tx_done_sem, wait given from ISR
res = xSemaphoreTake(p_uart_obj[uart_num]->tx_done_sem, (TickType_t)ticks_to_wait);
if (res == pdFALSE) {
    // The TX_DONE interrupt will be disabled in ISR
    xSemaphoreGive(p_uart_obj[uart_num]->tx_mux);
    return ESP_ERR_TIMEOUT;
}
xSemaphoreGive(p_uart_obj[uart_num]->tx_mux);
return ESP_OK;

}_

The three lines: _if (!UART_IS_MODE_SET(uart_num, UART_MODE_RS485_HALF_DUPLEX)) { uart_hal_clr_intsts_mask(&(uart_context[uart_num].hal), UART_INTR_TXDONE); }

is new to IDF 5.0 and it clears the TX DONE interrupt possibly after it had occured. If a TX DONE int happens immediately before this line (but after the check for TX idle) then the int is cleared and lost, and the following wait for the tx_done_sem will block forever. A temporary workaround is to block for Xms instead of forever, but that is not a guaranteed fix as it hides the issue and slows down the system.

DirkGmailMeintjies commented 1 year ago

I installed IDF V5.0.1 and this problem is still in the uart.c file.

ozanerturk commented 1 year ago

I am also facing this issue, we us 1000ms as following esp_err_t err = uart_wait_tx_done(UART_PORT, pdMS_TO_TICKS(1000)); idf.version --> ESP-IDF v5.1.1-678-g8c6114d0d2

UART_PORT = 2

Backtrace: 0x40081981:0x3fff0120 0x40090db5:0x3fff0140 0x4009886f:0x3fff0160 0x400f8c72:0x3fff01a0 0x400e1047:0x3fff01e0 0x400eae02:0x3fff0210 0x400e14a7:0x3fff0240 0x400df364:0x3fff0380 0x400dcf23:0x3fff0430 0x4013e5eb:0x3fff0480 0x4013d0d1:0x3fff04b0 0x4013d1df:0x3fff0540 0x4013da94:0x3fff0560 0x4013c234:0x3fff0580 0x401a13b7:0x3fff05a0 0x4013c99c:0x3fff05c0 0x4013c9ea:0x3fff0610 0x40093759:0x3fff0630 0x40081981: panic_abort at /home/ozan/esp/esp-idf/components/esp_system/panic.c:452

0x40090db5: esp_system_abort at /home/ozan/esp/esp-idf/components/esp_system/port/esp_system_chip.c:84

0x4009886f: __assert_func at /home/ozan/esp/esp-idf/components/newlib/assert.c:40

0x400f8c72: uart_wait_tx_done at /home/ozan/esp/esp-idf/components/driver/uart/uart.c:1118 (discriminator 2)

and : uart.c 1118 https://github.com/espressif/esp-idf/blob/b6a66b7d8c6295765da3cd8df4cef93359f16363/components/driver/uart/uart.c#L1118C30-L1118C30

JosuGZ commented 6 months ago

I just stumbled on this, you can check this related issue: https://github.com/espressif/esp-idf/issues/5156

I provided a fix back then, but it was rejected as another merge was in theory fixing the problem.

Maybe this helps.

AxelLin commented 1 month ago

@DirkGmailMeintjies Is this still an issue after https://github.com/espressif/esp-idf/commit/c801b3a18209ae707d3f8224b0734315cdc99db8 ?