espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.81k stars 7.32k forks source link

[BUG] Wifi/BLE Causing a problem with GPTIMER Dispatch - ESP-IDF V5.3.1 as of commit a0f798cfc4bbd624aab52b2c194d219e242d80c1 (IDFGH-14009) #14832

Open filzek opened 2 weeks ago

filzek commented 2 weeks ago

Answers checklist.

General issue report

Since SDK 5, we have been facing the same issue, and even after fixes to the Wi-Fi, the bug continues to affect the GPTIMER.

Create an ISR to trigger at intervals of 10, 17, 20, 457, and 7500 microseconds, executing 2 µs of code within it.

Set up a GPTIMER with the following configuration:

gptimer_config_t DRAM_ATTR timer_config_us_up = {
    .clk_src = GPTIMER_CLK_SRC_DEFAULT,
    .direction = GPTIMER_COUNT_UP,
    .resolution_hz = 1 * 1000 * 1000, // 1 MHz, 1 tick = 1 µs
    .intr_priority = 3,
    .flags.intr_shared = false,
};

The intr_priority setting (1, 2, or 3) does not seem to affect the outcome— the same issues occur regardless.

esp_err_t r;
gpio_config_t io_conf;
io_conf.intr_type = GPIO_INTR_DISABLE;           // No interrupt
io_conf.mode = GPIO_MODE_OUTPUT;                 // Set as output mode
io_conf.pin_bit_mask = (1ULL << 23) | (1ULL << 26); // Bit mask for GPIO23 and GPIO26
io_conf.pull_down_en = GPIO_PULLDOWN_DISABLE;
io_conf.pull_up_en = GPIO_PULLUP_DISABLE;
r = gpio_config(&io_conf);                       // Apply configuration
printf("GPIO Config Result: [%d] [%s]\n", r, esp_err_to_name(r));

esp_err_t timer_creation_result;
timer_creation_result = gptimer_new_timer(&timer_config_us_up, &gptimer_timelapse_action);
printf("timer_creation_result [%d] [%s] line [%d] func[%s] \n", timer_creation_result, esp_err_to_name(timer_creation_result), __LINE__, __func__);

gptimer_event_callbacks_t callback_source_timelapse = {
    .on_alarm = time_lapse_action,
};
gptimer_register_event_callbacks(gptimer_timelapse_action, &callback_source_timelapse, NULL);

gptimer_alarm_config_t run_as_timer_us = {
    .reload_count = 0,
    .alarm_count = 20, // 20 µs
    .flags.auto_reload_on_alarm = true, // Enable auto-reload
};
gptimer_set_alarm_action(gptimer_timelapse_action, &run_as_timer_us);
gptimer_enable(gptimer_timelapse_action);
gptimer_start(gptimer_timelapse_action);

The time_lapse_action simply toggles GPIO23 and GPIO26 at each dispatch, flipping their states.

The issue is that the GPTIMER experiences "blockouts" from 50 µs to 850 µs with no execution at all, which is completely unacceptable.

NewFile1

So, is there any way to fix it?

suda-morris commented 2 weeks ago

Maybe because the gptimer interrupt is preemptted by wifi.

A quick workaround could be, create two tasks, one is pined to core 0 and another is pined to core 1. (hope you're using esp32 or esp32s3 because they have dual cores). Then do gptimer initialization in core1, and do wifi/ble initialization in core 0.

AxelLin commented 2 weeks ago

General issue report

Since SDK 5, we have been facing the same issue, and even after fixes to the Wi-Fi, the bug continues to affect the GPTIMER.

@filzek Do you mean it was working in esp-idf 4.x ?

filzek commented 2 weeks ago

Yes as hardware timers are free to use in sdk 3.x and 4.x we are able to isolate the timer and use one or two timers and control completely the execution without any miss. Even controlling within ble or wifi in use, the timer dispatch takes place in front everything and was very accurate. In sdk 4.x this could cause registry corruption in the esp32 older than revision 3.0 and this registry corruption crashes the wifi and make it's unusable until a physical power off.

We will try tomorrow and workaround to move all tasks to core 0, and only use the ISR and Timer on core 1 to see if this problem still happen, if so, espressif team shall enter deep to fix it.

filzek commented 2 weeks ago

@AxelLin , @suda-morris

We have change BLE Nimble to be on core 0 as well Wifi already was on core 0.

Same problems, changes nothing, same behavior, GPTIMER do not respond correctly and stay missing dispatch with sometimes more than 700us window.

GPTIMER and ISR are setup form a PINNED TASK on core 1, and the problem is exactly the same.

filzek commented 2 weeks ago

@suda-morris, @AxelLin, We have check also the tasks running on the cores as well listed in vtasklist, so:

The Bluetooth control layer was running in the CORE 1 so we moved it to CORE 0.

The problem has reduced, but still happen, so, this way still not doable to run, need to find a fix for the timer.

We need a TIMER that true BREAKS everything and run whatsoever we need, so, if the system need to run, ok, but if we tell that we want to run an ISR and a GPTIMER accurate, it shall move to the ABOVE all other tasks and execute. The functions runs in a 2us code execution time in GPTIMER and 1us in the ISR GPIO LEVEL, so, this should not break anything and we were supose to take in place of anything.

filzek commented 2 weeks ago

Task Name Status Prio HWM Task# Affinity continuousadc R 20 1512 28 0 IDLE0 R 0 320 5 0 IDLE1 R 0 316 6 1 tiT B 18 2172 12 -1 websocket_task B 15 1228 27 1 ipc1 S 24 1580 2 1 MultiCast Recei B 10 6260 24 1 sys_evt B 20 552 13 0 httpd B 5 6700 26 1 ipc0 S 10 1552 1 0 nimble_host B 21 6244 20 0 mdns B 4 2428 29 0 btController B 23 2120 19 0 wifi B 23 4500 14 0 esp_timer S 22 1848 3 0 Tmr Svc B 1 1176 7 -1 btm_rrm_t B 2 6160 15 -1

So, this is the tasks list/

filzek commented 2 weeks ago

The missed window intervals have shifted from 60µs to 700µs, and while they occur less frequently now, the issue still persists.

NewFile2

filzek commented 2 weeks ago

The major issue occasionally extends to spans exceeding 1,400µs, which is severely problematic.

NewFile3

NewFile4