Open diplfranzhoepfinger opened 2 years ago
Tested with ESP-IDF v5.0-dev-4379-g36f49f361c and same Error.
if
#
# TWAI configuration
#
# CONFIG_TWAI_ISR_IN_IRAM is not set
# CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC is not set
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
# CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC is not set
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
# CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST is not set
CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID=y
CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT=y
# end of TWAI configuration
then it does NOT crash.
if
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y
# CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID is not set
# CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT is not set
# end of TWAI configuration
#
# TWAI configuration
#
CONFIG_TWAI_ISR_IN_IRAM=y
CONFIG_TWAI_ERRATA_FIX_BUS_OFF_REC=y
CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y
CONFIG_TWAI_ERRATA_FIX_RX_FRAME_INVALID=y
CONFIG_TWAI_ERRATA_FIX_RX_FIFO_CORRUPT=y
# end of TWAI configuration
then it does crash
It seems
CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST=y
is the Reason for the Crashing.
Stumbled across the same problem. This did not happen in prior (CAN not TWAI) libs. I do not think there is a solution when using the Arduino environment right?
I am having the same issue. In my case I powered up my device without any power on the other side of the CAN transceiver (ISO1042 Isolated). Is there any update on this issue?
@Dazza0 any news ?
disturb CAN-Bus by eigher shorting CAN-H to GND or CAN-H to CAN-L
@diplfranzhoepfinger @abombay @f-hoepfinger-hr-agrartechnik By disturbing the CAN/TWAI Bus, you are likely generating errors that trigger the HW errata conditions. The errata fixes have been fixed on master and backported all the way back to ESP-IDF v4.2.x. However, these errata workarounds are not enabled by default until ESP-IDF v5.0 onwards. If you are using an ESP-IDF version older than v5.0, please enable all of the CONFIG_TWAI_ERRATA_FIX_...
options and see if the issue is resolved.
@EmbeddedDevver Arduino should also have these workarounds enabled starting from v2.0.5
I have the same problem. My fix is to not use the twai_initiate_recovery() function, but instead of this uninstall, install and start the TWAI driver back. This fix may help you until the problem is fixed in the TWAI library.
@diplfranzhoepfinger @StehlikPhotoneo @abombay @f-hoepfinger-hr-agrartechnik
I suspect what's happening is the CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
workaround is triggering a false positive TX done event when a bus off occurs.
In twai_hal_iram.c:twai_hal_decode_interrupt()
, if you change...
#ifdef CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
if ((interrupts & TWAI_LL_INTR_TI || hal_ctx->state_flags & TWAI_HAL_STATE_FLAG_TX_BUFF_OCCUPIED) && status & TWAI_LL_STATUS_TBS) {
#else
to
#ifdef CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
if ((interrupts & TWAI_LL_INTR_TI || hal_ctx->state_flags & TWAI_HAL_STATE_FLAG_TX_BUFF_OCCUPIED)
&& status & TWAI_LL_STATUS_TBS
&& !(state_flags & TWAI_HAL_STATE_FLAG_BUS_OFF)) {
#else
does this end up resolving the issue?
i will check !
I am also having this issue. All of my TWAI_ERRATA fixes are enabled. I'm using ESPIDF 4.4.4 with a ESP32MINI-01 rev3 device.
I will try Dazza0 code patch right now and let you know.
Unfortunately this doesn't fix it. I should note that I am using single shot mode with no tx queue. I handle retries myself. This is the only function I use to send out CAN messages.
My transmit function is as follows:
IRAM_ATTR esp_err_t tts_twai_send_message(twai_message_t* msg, uint32_t paceTime100us, uint32_t timeoutMS)
{
#define MAX_TX_RETRIES (16)
uint32_t alerts;
uint32_t paceTimeTicks;
esp_err_t err;
int retryCount = MAX_TX_RETRIES;
xSemaphoreTake(_semTxLock, portMAX_DELAY);
msg->flags = TWAI_MSG_FLAG_SS;
TickType_t tickTimeout = xTaskGetTickCount() + pdMS_TO_TICKS(timeoutMS);
if (twai_read_alerts(&alerts, 0) == ESP_OK)
{
#if TWAI_REPORT_AND_HANDLE_ALERTS
tts_twai_report_alerts(alerts);
#endif
}
RetryTx:
err = twai_transmit(msg, pdMS_TO_TICKS(100));
if (err != ESP_OK)
{
ESP_LOGE(EXAMPLE_TAG, "twai_transmit failed! Error=%08X", err);
alerts = 0;
twai_read_alerts(&alerts, pdMS_TO_TICKS(100));
#if TWAI_REPORT_AND_HANDLE_ALERTS
tts_twai_report_alerts(alerts);
#endif
if (err == ESP_ERR_INVALID_STATE)
{
if (twai_initiate_recovery() == ESP_OK)
{
ESP_LOGW(EXAMPLE_TAG, "Bus recovery initiated..");
// wait for bus to recover
while (xTaskGetTickCount() <= tickTimeout)
{
if (twai_read_alerts(&alerts, pdMS_TO_TICKS(100)) == ESP_OK)
{
#if TWAI_REPORT_AND_HANDLE_ALERTS
tts_twai_report_alerts(alerts);
#endif
if (alerts & TWAI_ALERT_BUS_RECOVERED)
{
if (--retryCount <= 0)
{
ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
goto Done;
}
err = twai_start();
ESP_LOGW(EXAMPLE_TAG, "Bus Recovered. twai_start err=%i", err);
vTaskDelay(1);
goto RetryTx;
}
}
}
ESP_LOGE(EXAMPLE_TAG, "Timeout waiting for bus recovery");
}
else
{
ESP_LOGW(EXAMPLE_TAG, "TX failed for unknown reason");
if (--retryCount <= 0)
{
ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
goto Done;
}
vTaskDelay(1);
}
}
else
{
ESP_LOGE(EXAMPLE_TAG, "TX failed Error=%08X", err);
if (--retryCount <= 0)
{
ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
goto Done;
}
vTaskDelay(1);
goto RetryTx;
}
// this is caused by the loss of a TX interrupt. (see errata)
// it should not occur anymore as there is a work-a-round in place deep in the TWAI drvier stack.
//ESP_LOGE(EXAMPLE_TAG, "tts_twai_send_message1 Error=%08X", err);
//twai_stop();
//vTaskDelay(pdMS_TO_TICKS(20));
//twai_start();
//vTaskDelay(pdMS_TO_TICKS(20));
//err = twai_transmit(msg, pdMS_TO_TICKS(100));
}
else
{
// TX was successful. we should receive either an idle alert or some error promptly
// clear any alerts
if (twai_read_alerts(&alerts, pdMS_TO_TICKS(1000)) == ESP_OK)
{
#if TWAI_REPORT_AND_HANDLE_ALERTS
tts_twai_report_alerts(alerts);
#endif
if (alerts & TWAI_ALERT_TX_FAILED)
{
if (--retryCount <= 0)
{
ESP_LOGE(EXAMPLE_TAG, "Max TX retry count exceeded!");
err = ESP_ERR_TIMEOUT;
goto Done;
}
vTaskDelay(1);
goto RetryTx;
}
}
// honor pacetime
paceTimeTicks = pdMS_TO_TICKS(((TickType_t)paceTime100us + (TickType_t)9) / (TickType_t)10);
if (paceTimeTicks > 0)
{
vTaskDelay(paceTimeTicks);
}
else
{
uint64_t timeEnd = esp_timer_get_time() + (paceTime100us * 100);
while (timeEnd > esp_timer_get_time())
{
// taskYIELD() is used to request a context switch to another task. However, if there are
// no other tasks at a higher or equal priority to the task that calls taskYIELD() then
// the RTOS scheduler will simply select the task that called taskYIELD() to run again.
taskYIELD();
}
}
}
Done:
xSemaphoreGive(_semTxLock);
if (retryCount != MAX_TX_RETRIES)
{
//ESP_LOGI(EXAMPLE_TAG, "TX took %d retries", MAX_TX_RETRIES - retryCount);
}
return err;
}
If I use this:
#ifdef CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
//
if ((interrupts & TWAI_LL_INTR_TI || hal_ctx->state_flags & TWAI_HAL_STATE_FLAG_TX_BUFF_OCCUPIED) && status & TWAI_LL_STATUS_TBS && p_twai_obj->tx_msg_count) {
#else
it solves the problem. This is not a good fix but it works in my case because I'm not using any kind of TX queuing. It does provide more proof that the problem is with this errata "fix".
FYI, You have to manipulate some code to gain access to p_twai_obj
in-order to use this.
Hopefully this will help someone else find a proper fix. :)
I'm running into this after 40 minutes of runtime (more than 120,000 CAN messages sent).
I have CONFIG_TWAI_ERRATA_FIX_TX_INTR_LOST
set.
I'm using SDK 5.0 and I don't see any fixes for this posted in later implementations. Have I missed a patch?
@igrr - any comments?
@andrew-elder Have you tried code changes from Dazza0 above https://github.com/espressif/esp-idf/issues/9697#issuecomment-1591011346.
And do you have some custom setting like no_queue
? and your chip version ?
@wanckl I have not yet tired the fix from https://github.com/espressif/esp-idf/issues/9697#issuecomment-1591011346. I will try it.
I am using default setting. ie not using no_queue
. Chip version is v3.0
I (31) boot: chip revision: v3.0
@wanckl - the test failed after an hour with
assert failed: twai_handle_tx_buffer_frame twai.c:185 (p_twai_obj->tx_msg_count >= 0)
The suggested change checked bus off, but that isn't expected to happened in my setup. The CAN bus is hardwired to another device that is continuously sending messages to the ESP32.
Thank you for your help so far. Any other suggestions?
@wanckl - the test failed after an hour with
assert failed: twai_handle_tx_buffer_frame twai.c:185 (p_twai_obj->tx_msg_count >= 0)
The suggested change checked bus off, but that isn't expected to happened in my setup. The CAN bus is hardwired to another device that is continuously sending messages to the ESP32.
Thank you for your help so far. Any other suggestions?
Have you tried the fix I posted above? To use it, you have to disable the TX queue. It is not a good fix but it does solve the problem. I have been using it in a production product for a while now.
@travis012 - I have not tried the fix you posted above. I will do so.
@travis012 - an implementation very close to yours works for me. ie, I no longer observe the assert() error. I don't observe that twai_initiate_recovery()
ever gets called either. I think I am ok to continue development with what I have. Thank you for your help.
@wanckl - does espressif have plans to release a fix for this?
@travis012 - an implementation very close to yours works for me. ie, I no longer observe the assert() error. I don't observe that
twai_initiate_recovery()
ever gets called either. I think I am ok to continue development with what I have. Thank you for your help.
Is it better than mine? Can you use the TX queue? I'd like to improve what I have in our production devices. Please share your fix.
No, it's not better than yours. I removed the sem for example because I have a single thread sending messages. I have a hardwired CAN connection to another device, so there is never a BUS-OFF condition, so twai_initiate_recovery() always returns ESP_ERR_INVALID_STATE if I force it to trigger.
Any update on this? having the same issue
i have to check.
Has anyone come up with a solution? The BUG has just turned 2 years old, only in this thread.
Hey @diplfranzhoepfinger , help us out here, haha!
I just encountered this issue in my project. If the BUG has not been fixed yet, I would appreciate help with a safe way to restart my MCU to try to work around the problem.
My system allows me to reset all peripherals ‘hanging’ on the CAN bus.
I mention this because I noticed that the MCU’s return when restarted by esp_restart();
can also cause issues.
Answers checklist.
IDF version.
v4.4.2
Operating System used.
Linux
How did you build your project?
Eclipse IDE
If you are using Windows, please specify command line type.
No response
Development Kit.
Atom M5
Power Supply used.
USB
What is the expected behavior?
Not Chrashing
What is the actual behavior?
Instead it crash assert failed: twai_handle_tx_buffer_frame twai.c:183 (p_twai_obj->tx_msg_count >= 0)
Steps to reproduce.
Debug Logs.
More Information.
No response