espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.45k stars 7.25k forks source link

Task watchdog got triggered in task (IDFGH-9737) #11071

Closed Kampi closed 1 year ago

Kampi commented 1 year ago

Answers checklist.

General issue report

Hello,

I have an issue with understanding the task watchdog and the triggering of the task watchdog (esp-idf v5). Please take a look at this code snippet from my Lepton driver. This snippet creates a task for reading the camera, but I get a task watchdog trigger every 5 seconds and don´t understand why.

extern "C" void app_main(void)
{
    ESP_ERROR_CHECK(nvs_flash_init());

    if(Lepton_StartCapture(&_Device) != LEPTON_ERR_OK)
    {
        ESP_LOGE(TAG, "Can not start image capturing!");
    }

    while(true)
    {

    }
}

static void Lepton_CaptureTask(void* p_Args)
{
    Lepton_t* Device = static_cast<Lepton_t*>(p_Args);

    Device->Internal.Task.isRunning = true;
    while(Device->Internal.Task.isRunning)
    {
        esp_task_wdt_reset(); <-- The stack trace reports an error here
    }

    vTaskSuspend(NULL);
    vTaskDelete(NULL);
}

Lepton_Error_t Lepton_StartCapture(Lepton_t* p_Device)
{
    ...

    xTaskCreatePinnedToCore(&Lepton_CaptureTask, "Lepton_Capture", CONFIG_LEPTON_TASK_STACK, p_Device, CONFIG_LEPTON_TASK_PRIORITY, &p_Device->Internal.Task.Handle, CONFIG_LEPTON_TASK_CORE);

    if(p_Device->Internal.Task.Handle == NULL)
    {
        return LEPTON_ERR_FAIL;
    }

    esp_task_wdt_add(p_Device->Internal.Task.Handle);

    return LEPTON_ERR_OK;
}

I get this message every ~5 seconds (configured watchdog time).

E (507500) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (507500) task_wdt:  - IDLE (CPU 1)
E (507500) task_wdt: Tasks currently running:
E (507500) task_wdt: CPU 0: IDLE
E (507500) task_wdt: CPU 1: Lepton_Capture
E (507500) task_wdt: Print CPU 1 backtrace

Backtrace: 0x4008343E:0x3FFC0640 0x4008400D:0x3FFC0660 0x4000BFED:0x3FFB74C0 0x40092EB9:0x3FFB74D0 0x400DB596:0x3FFB74F0 0x400D314D:0x3FFB7530 0x40092BA1:0x3FFB7550

  #0  0x4008343E:0x3FFC0640 in esp_crosscore_isr at C:\Users\konta\.platformio\packages\framework-espidf\components\esp_system/crosscore_int.c:96
  #1  0x4008400D:0x3FFC0660 in _xt_lowint1 at C:\Users\konta\.platformio\packages\framework-espidf\components\freertos\FreeRTOS-Kernel\portable\xtensa/xtensa_vectors.S:1118  
  #2  0x4000BFED:0x3FFB74C0 in ?? ??:0
  #3  0x40092EB9:0x3FFB74D0 in vPortClearInterruptMaskFromISR at C:\Users\konta\.platformio\packages\framework-espidf\components\freertos\FreeRTOS-Kernel\portable\xtensa\include/freertos/portmacro.h:566
      (inlined by) vPortExitCritical at C:\Users\konta\.platformio\packages\framework-espidf\components\freertos\FreeRTOS-Kernel\portable\xtensa/port.c:342
  #4  0x400DB596:0x3FFB74F0 in esp_task_wdt_reset at C:\Users\konta\.platformio\packages\framework-espidf\components\esp_system\task_wdt/task_wdt.c:791
  #5  0x400D314D:0x3FFB7530 in Lepton_CaptureTask(void*) at components\Lepton\src/lepton_capture.cpp:98
  #6  0x40092BA1:0x3FFB7550 in vPortTaskWrapper at C:\Users\konta\.platformio\packages\framework-espidf\components\freertos\FreeRTOS-Kernel\portable\xtensa/port.c:154    

What do you think the correct way is to avoid this?

Thanks!

ESP-Marius commented 1 year ago

esp_task_wdt_reset() feeds the watchdog for the current task, in this case Lepton_CaptureTask, but the error you are seeing is coming from the idle task.

Since Lepton_CaptureTask is spinning in a while loop the idle task in FreeRTOS never gets a chance to run and thus never resets its watchdog. The idle task WDT purpose is to monitor exactly this, that one task doesnt block the core forever without yielding.

The correct way to avoid this would be to:

  1. Write your software in such a way that the idle task always gets a chance to run, e.g. by using blocking operations, waiting on semaphores etc.
  2. If you dont care about this and want the task to be able run like this you can simply disable the WDT in menuconfig (ESP_TASK_WDT_CHECK_IDLE_TASK_CPU1)
  3. If you just want a work-around then you can also periodically call vTaskDelay(), as this will ensure that the idle tasks gets the chance to run as well.
Kampi commented 1 year ago

Hi @ESP-Marius

thanks for your explanations. I´ve thought about it and how I can improve this. Basically Lepton_CaptureTask has to wait for a V-Sync signal to become high. I came up with the idea to use an interrupt and a queue for the ISR (based on the esp-idf GPIO example). I have changed the functions in the following way:

static void IRAM_ATTR Lepton_VSync_ISR_Handler(void* p_Args)
{
    uint32_t Value;
    Lepton_t* Device;

    Device = static_cast<Lepton_t*>(p_Args);
    Value = static_cast<uint32_t>(Device->Internal.VSync.IO);

    xQueueSendFromISR(Device->Internal.VSync.Queue, &Value, NULL);
}

Lepton_Error_t Lepton_StartCapture(Lepton_t* p_Device)
{
    Lepton_Error_t Error;

    Error = LEPTON_ERR_OK;

    // Create a queue to handle GPIO events from ISR.
    p_Device->Internal.VSync.Queue = xQueueCreate(8, sizeof(uint32_t));
    if(p_Device->Internal.VSync.Queue == NULL)
    {
        goto Lepton_StartCapture_Error_1;
        Error = LEPTON_ERR_NO_MEM;
    }

    xTaskCreatePinnedToCore(&Lepton_CaptureTask, "Lepton_Capture", CONFIG_LEPTON_TASK_STACK, p_Device, 

    if(p_Device->Internal.Task.Handle == NULL)
    {
        goto Lepton_StartCapture_Error_2;
        Error = LEPTON_ERR_NO_MEM;
    }

    esp_task_wdt_add(p_Device->Internal.Task.Handle);

    // V-Sync is a high-level signal. So we need to add a positive edge interrupt.
    gpio_set_direction(p_Device->Internal.VSync.IO, GPIO_MODE_INPUT);
    gpio_set_pull_mode(p_Device->Internal.VSync.IO, GPIO_PULLDOWN_ONLY);
    gpio_set_intr_type(p_Device->Internal.VSync.IO, GPIO_INTR_POSEDGE);
    gpio_install_isr_service(0);

    // Hook ISR handler for specific GPIO.
    gpio_isr_handler_add(p_Device->Internal.VSync.IO, Lepton_VSync_ISR_Handler, p_Device);

    return Error;

Lepton_StartCapture_Error_2:
    vQueueDelete(p_Device->Internal.VSync.Queue);

Lepton_StartCapture_Error_1:
    return Error;
}

static void Lepton_CaptureTask(void* p_Args)
{
    Lepton_t* Device;

    VSyncCount = 0;
    Device = static_cast<Lepton_t*>(p_Args);
    Device->Internal.Task.isRunning = true;
    while(Device->Internal.Task.isRunning)
    {
        uint32_t io_num;

        esp_task_wdt_reset();

        if(xQueueReceive(Device->Internal.VSync.Queue, &io_num, 10 / portTICK_PERIOD_MS))
        {
            printf("GPIO[%"PRIu32"] intr, val: %d\n", io_num, gpio_get_level(static_cast<gpio_num_t>(io_num)));
        }
    }
    vTaskSuspend(NULL);
    vTaskDelete(NULL);
}

I guess this will solve the problem, but I need to check this more closely because I have to make sure that no V-Sync is missed.

ESP-Marius commented 1 year ago

Yeah, something similar to this approach seems like the correct way to approach it.

I think we can just close this issue then right? As there doesnt seem to be any ESP-IDF related problems here.

igrr commented 1 year ago

@Kampi does your camera use DVP interface (VSYNC, HSYNC, D0-D7)? If yes, you might consider ESP32-S2 and ESP32-S3 — they have an LCD_CAM peripheral which can receive frames from the camera, writing them to memory using DMA.

Kampi commented 1 year ago

Hi @igrr, thanks for your suggestion. Unfortunately no. It´s a FLIR Lepton Thermal Camera that uses an SPI to transmit the data.