espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.35k stars 7.21k forks source link

assert failed: twai_handle_tx_buffer_frame twai.c:183 (p_twai_obj->tx_msg_count >= 0) if CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y (IDFGH-8204) #9697

Open diplfranzhoepfinger opened 2 years ago

diplfranzhoepfinger commented 2 years ago

Answers checklist.

IDF version.

v4.4.2

Operating System used.

Linux

How did you build your project?

Eclipse IDE

If you are using Windows, please specify command line type.

No response

Development Kit.

Atom M5

Power Supply used.

USB

What is the expected behavior?

Not Chrashing

What is the actual behavior?

Instead it crash assert failed: twai_handle_tx_buffer_frame twai.c:183 (p_twai_obj->tx_msg_count >= 0)

Steps to reproduce.

  1. Clone Repo: https://github.com/diplfranzhoepfinger/canrecovery
  2. let it run a while.
  3. disturb CAN-Bus by eigher shorting CAN-H to GND or CAN-H to CAN-L
  4. it will crash instead of a recovery.

Debug Logs.

I (100100) TWAI Master: Surpassed Error Warning Limit
I (100100) TWAI Master: Entered Error Passive state
I (100100) TWAI Master: Bus Off state
W (100100) TWAI Master: Initiate bus recovery in 50ms
E (100110) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
W (100120) TWAI Master: Initiate bus recovery in 40ms
W (100130) TWAI Master: Initiate bus recovery in 30ms
W (100140) TWAI Master: Initiate bus recovery in 20ms
W (100150) TWAI Master: Initiate bus recovery in 10ms
E (100210) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100310) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100410) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100510) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100610) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100710) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100810) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (100910) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101010) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101110) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101210) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101310) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101410) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101510) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101610) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101710) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101810) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (101910) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102010) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102110) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102210) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102310) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102410) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102510) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102610) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102710) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102810) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (102910) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (103010) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
E (103110) main: twai_transmit failed, 259 ESP_ERR_INVALID_STATE
I (103150) TWAI Master: Initiate bus recovery

assert failed: twai_handle_tx_buffer_frame twai.c:183 (p_twai_obj->tx_msg_count >= 0)

Backtrace:0x400818de:0x3ffb0a000x40085841:0x3ffb0a20 0x4008ae2d:0x3ffb0a40 0x4008313d:0x3ffb0b60 0x40082869:0x3ffb0ba0 0x400e8257:0x3ffb62f0 0x400d1c6f:0x3ffb6310 0x40086ea2:0x3ffb6330 0x400883d1:0x3ffb6350 
0x400818de: panic_abort at /home/franz/esp-idf-v4.4.2/components/esp_system/panic.c:402

0x40085841: esp_system_abort at /home/franz/esp-idf-v4.4.2/components/esp_system/esp_system.c:128

0x4008ae2d: __assert_func at /home/franz/esp-idf-v4.4.2/components/newlib/assert.c:85

0x4008313d: twai_handle_tx_buffer_frame at /home/franz/esp-idf-v4.4.2/components/driver/twai.c:183
 (inlined by) twai_intr_handler_main at /home/franz/esp-idf-v4.4.2/components/driver/twai.c:226

0x40082869: _xt_lowint1 at /home/franz/esp-idf-v4.4.2/components/freertos/port/xtensa/xtensa_vectors.S:1111

0x400e8257: cpu_ll_waiti at /home/franz/esp-idf-v4.4.2/components/hal/esp32/include/hal/cpu_ll.h:183
 (inlined by) esp_pm_impl_waiti at /home/franz/esp-idf-v4.4.2/components/esp_pm/pm_impl.c:837

0x400d1c6f: esp_vApplicationIdleHook at /home/franz/esp-idf-v4.4.2/components/esp_system/freertos_hooks.c:63

0x40086ea2: prvIdleTask at /home/franz/esp-idf-v4.4.2/components/freertos/tasks.c:3973 (discriminator 1)

0x400883d1: vPortTaskWrapper at /home/franz/esp-idf-v4.4.2/components/freertos/port/xtensa/port.c:131

ELF file SHA256: a42730679ffbb41a

Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 271414342, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6660
load:0x40078000,len:14848
ho 0 tail 12 room 4
load:0x40080400,len:3792
0x40080400: _init at ??:?

entry 0x40080694
I (29) boot: ESP-IDF v4.4.2 2nd stage bootloader

More Information.

No response

diplfranzhoepfinger commented 2 years ago

Tested with ESP-IDF v5.0-dev-4379-g36f49f361c and same Error.

diplfranzhoepfinger commented 2 years ago

if

#
# TWAI configuration
#
# CONFIG_TWAI_ISR_IN_IRAM is not set
# CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC is not set
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
# CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC is not set
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID=y
CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT=y
# end of TWAI configuration

then it does NOT crash.

if

#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y
CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID=y
CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT=y
# end of TWAI configuration

then it does crash

diplfranzhoepfinger commented 2 years ago

It seems

CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y

is the Reason for the Crashing.

EmbeddedDevver commented 1 year ago

Stumbled across the same problem. This did not happen in prior (CAN not TWAI) libs. I do not think there is a solution when using the Arduino environment right?

abombay commented 1 year ago

I am having the same issue. In my case I powered up my device without any power on the other side of the CAN transceiver (ISO1042 Isolated). Is there any update on this issue?

f-hoepfinger-hr-agrartechnik commented 1 year ago

@Dazza0 any news ?

Dazza0 commented 1 year ago

disturb CAN-Bus by eigher shorting CAN-H to GND or CAN-H to CAN-L

@diplfranzhoepfinger @abombay @f-hoepfinger-hr-agrartechnik By disturbing the CAN/TWAI Bus, you are likely generating errors that trigger the HW errata conditions. The errata fixes have been fixed on master and backported all the way back to ESP-IDF v4.2.x. However, these errata workarounds are not enabled by default until ESP-IDF v5.0 onwards. If you are using an ESP-IDF version older than v5.0, please enable all of the CONFIG_TWAI_ERRATA_FIX_... options and see if the issue is resolved.

@EmbeddedDevver Arduino should also have these workarounds enabled starting from v2.0.5

StehlikPhotoneo commented 1 year ago

I have the same problem. My fix is to not use the twai_initiate_recovery() function, but instead of this uninstall, install and start the TWAI driver back. This fix may help you until the problem is fixed in the TWAI library.

Dazza0 commented 1 year ago

@diplfranzhoepfinger @StehlikPhotoneo @abombay @f-hoepfinger-hr-agrartechnik

I suspect what's happening is the CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST workaround is triggering a false positive TX done event when a bus off occurs.

In twai_hal_iram.c:twai_hal_decode_interrupt(), if you change...

#ifdef CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
    if ((interrupts & TWAI_LL_INTR_TI || hal_ctx->state_flags & TWAI_HAL_STATE_FLAG_TX_BUFF_OCCUPIED) && status & TWAI_LL_STATUS_TBS) {
#else

to

#ifdef CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
    if ((interrupts & TWAI_LL_INTR_TI || hal_ctx->state_flags & TWAI_HAL_STATE_FLAG_TX_BUFF_OCCUPIED)
        && status & TWAI_LL_STATUS_TBS
        && !(state_flags & TWAI_HAL_STATE_FLAG_BUS_OFF)) {
#else

does this end up resolving the issue?

franz-ms-muc commented 1 year ago

i will check !

travis012 commented 1 year ago

I am also having this issue. All of my TWAI_ERRATA fixes are enabled. I'm using ESPIDF 4.4.4 with a ESP32MINI-01 rev3 device.

I will try Dazza0 code patch right now and let you know.

travis012 commented 1 year ago

Unfortunately this doesn't fix it. I should note that I am using single shot mode with no tx queue. I handle retries myself. This is the only function I use to send out CAN messages.

My transmit function is as follows:

IRAM_ATTR esp_err_t tts_twai_send_message(twai_message_t* msg, uint32_t paceTime100us, uint32_t timeoutMS)
{
#define MAX_TX_RETRIES (16)
    uint32_t alerts;
    uint32_t paceTimeTicks;
    esp_err_t err;
    int retryCount = MAX_TX_RETRIES;

    xSemaphoreTake(_semTxLock, portMAX_DELAY);

    msg->flags = TWAI_MSG_FLAG_SS;

    TickType_t tickTimeout = xTaskGetTickCount() + pdMS_TO_TICKS(timeoutMS);
    if (twai_read_alerts(&alerts, 0) == ESP_OK)
    {
#if TWAI_REPORT_AND_HANDLE_ALERTS
        tts_twai_report_alerts(alerts);
#endif          
    }

RetryTx:
    err = twai_transmit(msg, pdMS_TO_TICKS(100));
    if (err != ESP_OK)
    {
        ESP_LOGE(EXAMPLE_TAG, "twai_transmit failed! Error=%08X", err);
        alerts = 0;
        twai_read_alerts(&alerts, pdMS_TO_TICKS(100));
#if TWAI_REPORT_AND_HANDLE_ALERTS
        tts_twai_report_alerts(alerts);
#endif      
        if (err == ESP_ERR_INVALID_STATE)
        {
            if (twai_initiate_recovery() == ESP_OK)
            {
                ESP_LOGW(EXAMPLE_TAG, "Bus recovery initiated..");
                // wait for bus to recover
                while (xTaskGetTickCount() <= tickTimeout)
                {
                    if (twai_read_alerts(&alerts, pdMS_TO_TICKS(100)) == ESP_OK)
                    {
#if TWAI_REPORT_AND_HANDLE_ALERTS
                        tts_twai_report_alerts(alerts);
#endif                          
                        if (alerts & TWAI_ALERT_BUS_RECOVERED)
                        {
                            if (--retryCount <= 0) 
                            {
                                ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
                                goto Done; 
                            }

                            err = twai_start();
                            ESP_LOGW(EXAMPLE_TAG, "Bus Recovered. twai_start err=%i", err);
                            vTaskDelay(1);
                            goto RetryTx;
                        }                       
                    }
                }
                ESP_LOGE(EXAMPLE_TAG, "Timeout waiting for bus recovery");
            }
            else
            {
                ESP_LOGW(EXAMPLE_TAG, "TX failed for unknown reason");
                if (--retryCount <= 0) 
                {
                    ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
                    goto Done; 
                }
                vTaskDelay(1);
            }
        }
        else
        {
            ESP_LOGE(EXAMPLE_TAG, "TX failed Error=%08X", err);
            if (--retryCount <= 0) 
            {
                ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
                goto Done; 
            }
            vTaskDelay(1);
            goto RetryTx;

        }
        // this is caused by the loss of a TX interrupt. (see errata)
        // it should not occur anymore as there is a work-a-round in place deep in the TWAI drvier stack.
        //ESP_LOGE(EXAMPLE_TAG, "tts_twai_send_message1 Error=%08X", err);
        //twai_stop();
        //vTaskDelay(pdMS_TO_TICKS(20));
        //twai_start();
        //vTaskDelay(pdMS_TO_TICKS(20));
        //err = twai_transmit(msg, pdMS_TO_TICKS(100));

    }
    else
    {
        // TX was successful. we should receive either an idle alert or some error promptly

        // clear any alerts
        if (twai_read_alerts(&alerts, pdMS_TO_TICKS(1000)) == ESP_OK)
        {
#if TWAI_REPORT_AND_HANDLE_ALERTS
            tts_twai_report_alerts(alerts);
#endif  
            if (alerts & TWAI_ALERT_TX_FAILED)
            {
                if (--retryCount <= 0) 
                {
                    ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
                    err = ESP_ERR_TIMEOUT;
                    goto Done; 
                }
                vTaskDelay(1);
                goto RetryTx;
            }
        }

        // honor pacetime
        paceTimeTicks = pdMS_TO_TICKS(((TickType_t)paceTime100us + (TickType_t)9) / (TickType_t)10);
        if (paceTimeTicks > 0)
        {
            vTaskDelay(paceTimeTicks);
        }
        else
        {
            uint64_t timeEnd = esp_timer_get_time() + (paceTime100us * 100);
            while (timeEnd > esp_timer_get_time())
            {
                // taskYIELD() is used to request a context switch to another task. However, if there are 
                // no other tasks at a higher or equal priority to the task that calls taskYIELD() then 
                // the RTOS scheduler will simply select the task that called taskYIELD() to run again. 
                taskYIELD();
            }

        }
    }
Done:
    xSemaphoreGive(_semTxLock);
    if (retryCount != MAX_TX_RETRIES)
    {
        //ESP_LOGI(EXAMPLE_TAG, "TX took %d retries", MAX_TX_RETRIES - retryCount);
    }
    return err;
}
travis012 commented 1 year ago

If I use this:

#ifdef CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
    // 
    if ((interrupts & TWAI_LL_INTR_TI || hal_ctx->state_flags & TWAI_HAL_STATE_FLAG_TX_BUFF_OCCUPIED) && status & TWAI_LL_STATUS_TBS && p_twai_obj->tx_msg_count) {
#else

it solves the problem. This is not a good fix but it works in my case because I'm not using any kind of TX queuing. It does provide more proof that the problem is with this errata "fix".

FYI, You have to manipulate some code to gain access to p_twai_obj in-order to use this.

Hopefully this will help someone else find a proper fix. :)

andrew-elder commented 9 months ago

I'm running into this after 40 minutes of runtime (more than 120,000 CAN messages sent). I have CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST set. I'm using SDK 5.0 and I don't see any fixes for this posted in later implementations. Have I missed a patch?

andrew-elder commented 9 months ago

@igrr - any comments?

wanckl commented 9 months ago

@andrew-elder Have you tried code changes from Dazza0 above https://github.com/espressif/esp-idf/issues/9697#issuecomment-1591011346.
And do you have some custom setting like no_queue ? and your chip version ?

andrew-elder commented 9 months ago

@wanckl I have not yet tired the fix from https://github.com/espressif/esp-idf/issues/9697#issuecomment-1591011346. I will try it.

I am using default setting. ie not using no_queue. Chip version is v3.0

I (31) boot: chip revision: v3.0
andrew-elder commented 9 months ago

@wanckl - the test failed after an hour with

assert failed: twai_handle_tx_buffer_frame twai.c:185 (p_twai_obj->tx_msg_count >= 0)

The suggested change checked bus off, but that isn't expected to happened in my setup. The CAN bus is hardwired to another device that is continuously sending messages to the ESP32.

Thank you for your help so far. Any other suggestions?

travis012 commented 9 months ago

@wanckl - the test failed after an hour with

assert failed: twai_handle_tx_buffer_frame twai.c:185 (p_twai_obj->tx_msg_count >= 0)

The suggested change checked bus off, but that isn't expected to happened in my setup. The CAN bus is hardwired to another device that is continuously sending messages to the ESP32.

Thank you for your help so far. Any other suggestions?

Have you tried the fix I posted above? To use it, you have to disable the TX queue. It is not a good fix but it does solve the problem. I have been using it in a production product for a while now.

andrew-elder commented 9 months ago

@travis012 - I have not tried the fix you posted above. I will do so.

andrew-elder commented 9 months ago

@travis012 - an implementation very close to yours works for me. ie, I no longer observe the assert() error. I don't observe that twai_initiate_recovery() ever gets called either. I think I am ok to continue development with what I have. Thank you for your help.

@wanckl - does espressif have plans to release a fix for this?

travis012 commented 9 months ago

@travis012 - an implementation very close to yours works for me. ie, I no longer observe the assert() error. I don't observe that twai_initiate_recovery() ever gets called either. I think I am ok to continue development with what I have. Thank you for your help.

Is it better than mine? Can you use the TX queue? I'd like to improve what I have in our production devices. Please share your fix.

andrew-elder commented 9 months ago

No, it's not better than yours. I removed the sem for example because I have a single thread sending messages. I have a hardwired CAN connection to another device, so there is never a BUS-OFF condition, so twai_initiate_recovery() always returns ESP_ERR_INVALID_STATE if I force it to trigger.

italocjs commented 1 month ago

Any update on this? having the same issue

diplfranzhoepfinger commented 1 month ago

i have to check.

willianaraujo commented 1 day ago

Has anyone come up with a solution? The BUG has just turned 2 years old, only in this thread.

Hey @diplfranzhoepfinger , help us out here, haha!

I just encountered this issue in my project. If the BUG has not been fixed yet, I would appreciate help with a safe way to restart my MCU to try to work around the problem.

My system allows me to reset all peripherals ‘hanging’ on the CAN bus.

I mention this because I noticed that the MCU’s return when restarted by esp_restart();can also cause issues.