espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.73k stars 7.3k forks source link

ESP32S2 Internal Temperature sensor and WiFi in simultaneous we get one crash (IDFGH-6429) #8088

Open brunohorta82 opened 2 years ago

brunohorta82 commented 2 years ago

When we use the internal temperature sensor and WiFi the system call watchdog and reboots, this is normal behavior?

Alvin1Zhang commented 2 years ago

@brunohorta82 Thanks for reporting. Would you please help provide more details as suggested in the issue template? Information like elf, sdk configuration, backtrace, log outputs, commit ID, hardware and etc. would help us debug further. Thanks.

arex-ebee commented 2 years ago

@Alvin1Zhang I'm hooking up into this issue as I observed the same behaviour at that moment when WiFi is enabled. I already filed an issue on esp32.com (https://www.esp32.com/viewtopic.php?p=90332) but there I was advised to try it on esp-idf github repo. So let me follow your request for the template:

Environment

Problem Description

I'm using the internal temperature sensor on ESP32-S2 as part of the overall system health monitoring. After initialization (temp_sensor_set_config() and temp_sensor_start()) there is a cyclic call to temp_sensor_read_celsius(). This is not a big deal and works as expected. Additionally I needed to put the WiFi part into operation with station mode. Having this done and calling esp_wifi_start() I observed a weird behavior of the whole system suddenly printing task watchdog timeouts within different tasks, not reacting fluently any more, etc. pp.

Analyzing the root cause brought me to insight that the call to temp_sensor_read_celsius() never returns anymore after starting WiFi. Instead it enters an endless busy-wait loop within temp_sensor_read_raw() polling for HW register bit SENS_SAR_TSENS_CTRL_REG.SENS_TSENS_READY to become 1, which never happens anymore. Comparing the whole register value against the one when it was working showed that, as soon as WiFi gets enable, the temperature sensor HW is obviously powered off as the bits SENS_TSENS_POWER_UP and SENS_TSENS_POWER_UP_FORCE drop to 0 (although they were 1 prior to starting WiFi).

I could reproduce this behaviour using the simple wifi station example from IDF (using release/v4.4 branch as well as master), only adding the cyclic temperature sensor readout.

Expected Behavior

Temperature can be read out easily while WiFi station is active and connected

Actual Behavior

Temperature readout call temp_sensor_read_celsius() never returns and enters an unpaced busy-wait loop influencing the whole system.

Steps to reproduce

Code to reproduce this issue

See modified station_example_main.c

Debug Logs

(see attached file for full log)

[...]

I (370) wifi station: Wait 5s before connecting WiFi
I (1370) wifi station: Temp: 31.2°C
I (2370) wifi station: Temp: 31.2°C
I (3370) wifi station: Temp: 31.2°C
I (4370) wifi station: Temp: 31.2°C
I (5370) wifi station: Temp: 31.2°C
I (5380) wifi station: Pause temperature polling
I (5380) wifi station: ESP_WIFI_MODE_STA

[...]

I (9080) wifi station: connected to ap SSID:******* password:********
I (9090) wifi station: Resume temperature polling

E (14360) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (14360) task_wdt:  - IDLE (CPU 0)
E (14360) task_wdt: Tasks currently running:
E (14360) task_wdt: CPU 0: temp_read
E (14360) task_wdt: Print CPU 0 (current core) backtrace

Backtrace:0x4008C7CB:0x3FFC62700x40024796:0x3FFC6290 0x400884FA:0x3FFD0890 0x40088579:0x3FFD08C0 0x4008692D:0x3FFD0900 0x4002C8A9:0x3FFD0940 
0x4008c7cb: task_wdt_isr at /[...]/esp-idf-4.4/components/esp_system/task_wdt.c:183 (discriminator 3)

0x40024796: _xt_lowint1 at /[...]/esp-idf-4.4/components/freertos/port/xtensa/xtensa_vectors.S:1111

0x400884fa: temp_sensor_read_raw at /[...]/esp-idf-4.4/components/driver/esp32s2/rtc_tempsensor.c:140 (discriminator 2)

0x40088579: temp_sensor_read_celsius at /[...]/esp-idf-4.4/components/driver/esp32s2/rtc_tempsensor.c:177

0x4008692d: temp_sensor_test_task at /[...]/esp-idf-4.4/examples/wifi/getting_started/station/build/../main/station_example_main.c:177 (discriminator 1)

0x4002c8a9: vPortTaskWrapper at /[...]/esp-idf-4.4/components/freertos/port/xtensa/port.c:130

Other items if possible

station_example_main.log

Maldus512 commented 2 years ago

I'm suffering from the exact same issue

elpamyelhsa commented 2 years ago

Not just me then. I've been seeing this problem since last year but didn't have the time to trace it. Hopfully someone will have it fixed shortly.

NoNullptr commented 2 years ago

I had the same issue and just found this bug report.

The issue is that WiFi resets these three registers

SENS.sar_tctrl.tsens_power_up SENS.sar_tctrl.tsens_power_up_force SENS.sar_tctrl.tsens_xpd_force

to zero after starting, instead of returning them to their initial value. I assume that the WiFi code reads the temperature to calibrate, but since the WiFi stack is closed, only someone with access can solve it.

The bug is triggered by temperature_sensor_ll_get_raw_value, which writes 1 to SENS.sar_tctrl.tsens_dump_out and then loops until SENS.sar_tctrl.tsens_ready, which is an endless loop in the case when WiFi turned off tsense in the meantime, which then triggers the WDT.

As a temporary workaround, I replaced temperature_sensor_ll_get_raw_value by a function that checks if tsens is still turned on while waiting for the temperature to be ready.

bitmandu commented 1 year ago

Just following up that this problem persists in ESP-IDF v5.1-dev-2186-g454aeb3a48. I'm using an ESP-S2-SOLO-2 development board (chip revision v1.0).

If you combine the station and temp_sensor examples, there is one temperature measurement, and then the watchdog is triggered.

I (8511) wifi station: Enable temperature sensor
I (8521) wifi station: Read temperature
I (8521) wifi station: Temperature value 25.41 ℃
E (14511) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time:
E (14511) task_wdt:  - IDLE (CPU 0)
E (14511) task_wdt: Tasks currently running:
E (14511) task_wdt: CPU 0: main
E (14511) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x40088EB7:0x3FFC7260 0x4008900E:0x3FFC7280 0x40024815:0x3FFC72A0 0x40088824:0x3FFCED30 0x400877D6:0x3FFCED60 0x400F1F81:0x3FFCEDB0 0x4002C905:0x3FFCEDE0
0x40088eb7: task_wdt_timeout_handling at /home/kris/local/esp-idf/components/esp_system/task_wdt/task_wdt.c:461 (discriminator 3)
...
softhack007 commented 1 year ago

Any news on this topic?

It seems like the temp sensor is useless once that wifi is started. Are there any workarounds, like starting/stopping the sensors directly before and after a single read? I want to read the sensor every 30 seconds.

would the following work if I repeat this for each measurement?

temp_sensor_start();
temp_sensor_read_celsius(&chip_temp);
temp_sensor_stop();
Samdaaman commented 1 year ago

Any update on this?

I have tried reinitialising the temperature sensor each time but no luck when WiFi is running. Will likely take the route others have suggested and just not use temperature entirely whilst WiFi is running...

KWolfe81 commented 1 year ago

Hoping for a fix on this too. Using ESP-IDF 4.4.

NathanJPhillips commented 1 year ago

@NoNullptr, if your workaround is giving you success, could you possibly share it?

NoNullptr commented 1 year ago

@softhack007 While this might reduce the frequency of crashes, it won't prevent them.

I don't have access to the machine with my quick fix, but I can hopefully post it by the end of next week.

MartinPatarinski commented 1 year ago

I observe the same issue. Log is below.

I think that the issue can be coming from the implementation of temperature_sensor_ll_get_raw_value() image This blocking while loop can be interrupted by OS and this register ready may be missed.

Therefore a possible solution is to enter critical section before getting the register value and then exit: For example: image

Log: E (208759) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time: E (208759) task_wdt: - IDLE (CPU 0) E (208759) task_wdt: Tasks currently running: E (208759) task_wdt: CPU 0: APPTask E (208759) task_wdt: CPU 1: IDLE E (208759) task_wdt: Aborting. E (208759) task_wdt: Print CPU 0 (current core) backtrace

Backtrace: 0x4206ccbb:0x3fcb4bb0 0x420221d9:0x3fcb4be0 0x42022258:0x3fcb4c20 0x4202234c:0x3fcb4c50 0x42021947:0x3fcb4c80 0x42129e47:0x3fcb4cb0 0x420106a9:0x3fcb4cd0 0x403867ba:0x3fcb4cf0 0x4206ccbb: temperature_sensor_ll_get_raw_value at C:/Espressif/frameworks/esp-idf-v5.0/components/hal/esp32s3/include/hal/temperature_sensor_ll.h:80 (inlined by) temperature_sensor_get_celsius at C:/Espressif/frameworks/esp-idf-v5.0/components/driver/temperature_sensor.c:187

NoNullptr commented 1 year ago

@NathanJPhillips, sorry for taking long to respond. The observations of @RilabsAutomotive are correct and align with what I saw. Since the temperature sensor is accessed in multiple locations (temp_sensor and the WiFi stack, at least), it needs to be protected by a critical section.

However, in order for this to work, the code needs to guarantee in all locations that it also restores the register state after reading the sensor.

I just finished making a presentation/video about ESP’s clock and its drift. For this, I needed to sample the temperature sensor every 2 seconds, while the WiFi turns on and off every 10 minutes. In case you’re interested, I uploaded it to YouTube: https://youtu.be/fZAR8WTKiSg. Long story short, it turns out that WiFi does not restore the tsens registers to their previous state after using it – I tested it with IDF 4.4 before IDF 4.5 came out.

Therefore: temperature_sensor_ll_get_raw_value needs to also ensure that the temperature sensor is powered up, that is, that all the registers are set correctly. If the sensors do need to be powered up, the task then needs be delayed for the output of temperature sensor to stabilize. Critical sections should not delay or wait, so you need to repeatedly exit and enter the section until the state is how you need it to be. Since the core issue is the WiFi code (closed-source) and it needs to properly restore the state of the temperature sensor, I didn’t bother making it work with critical sections, but used this dirty looping fix instead:

static inline uint32_t temperature_sensor_ll_get_raw_value(void)
{
        if (!SENS.sar_peri_clk_gate_conf.tsens_clk_en ||
            !SENS.sar_tctrl2.tsens_xpd_force ||
            !SENS.sar_tctrl.tsens_power_up_force ||
            !SENS.sar_tctrl.tsens_power_up ||
            !SENS.sar_tctrl.tsens_dump_out
        ) {
s:              SENS.sar_peri_clk_gate_conf.tsens_clk_en = true;
                SENS.sar_tctrl2.tsens_xpd_force = true;
                SENS.sar_tctrl.tsens_power_up_force = true;
                SENS.sar_tctrl.tsens_power_up = true;
                vTaskDelay(pdMS_TO_TICKS(10));
        }

        SENS.sar_tctrl.tsens_dump_out = 1;
        while (!SENS.sar_tctrl.tsens_ready) {
                if (!SENS.sar_peri_clk_gate_conf.tsens_clk_en ||
                    !SENS.sar_tctrl2.tsens_xpd_force ||
                    !SENS.sar_tctrl.tsens_power_up_force ||
                    !SENS.sar_tctrl.tsens_power_up ||
                    !SENS.sar_tctrl.tsens_dump_out
                ) {
                        goto s;
                }
        }
        SENS.sar_tctrl.tsens_dump_out = 0;
        return SENS.sar_tctrl.tsens_out;
}

It worked without any issues for six weeks of continuous sampling while the WiFi regularly turned on and off. If you need to make the temperature sensor work right now, you can replace the function in “esp-idf/components/hal/esp32XX/include/hal/temperature_sensor_ll.h”.

While this works, the quality of the code is low, but you cannot fundamentally improve it without access to the closed part of the firmware. Maybe IDF 4.5 fixed the issue in WiFi already, but someone would need to test it. Most people from espressif seem to be either too busy, or to have bad understanding of English or have questionable coding skills, so this might be the only solution for a while – this bug has been open for more than a year already.

franz-ms-muc commented 1 year ago

@igrr can you have a look into this ? seems to be a orphaned issue. Thanks !

DCSBL commented 1 year ago

It seems this problem also exist in the ESP32S3, at least in v5.0.2. The task watchdog triggers an abort because the temperature sensor code does not return.

For us, this only happens when the device tries to connect to wifi but it cannot connect (ssid does not exists). It seems that all other tasks are waiting to be dequeued. We pin all tasks to core 0.

https://github.com/espressif/esp-idf/blob/5181de8ac5ec5e18f04f634da8ce173b7ef5ab73/components/hal/esp32s3/include/hal/temperature_sensor_ll.h#L77-L84

==================== THREAD 1 (TCB: 0x3fcaa404, name: 'Tmr Svc') =====================
#0  0x420649fb in temperature_sensor_ll_get_raw_value () at ~/esp/esp-idf-v5.0.2/components/hal/esp32s3/include/hal/temperature_sensor_ll.h:80
#1  temperature_sensor_get_celsius (tsens=0x3c1a7700, out_celsius=0x3fcaa1f4) at ~/esp/esp-idf-v5.0.2/components/driver/temperature_sensor.c:187
#2  0x42016fc0 in operator() (timer=..., __closure=0x3fcdfd40) at ~/project/components/app/app_temperature.cpp:40
...
ddomnik commented 1 year ago

What's the state of this issue? I see a pull request has opened months ago.

JPSaturninoOliveira commented 1 year ago

I'm using ESP32S3 with ESP-IDF v5.1.1 and problem continues to exist (only with WIFI running).

mcoracin commented 8 months ago

Up. I'm using ESP32S3 with ESP-IDF v5.1, and issue is there.

diplfranzhoepfinger commented 7 months ago

now the PR is in conflict.

how to go on ?

mythbuster5 commented 7 months ago

@JPSaturninoOliveira @mcoracin @diplfranzhoepfinger There is already a fix since 5.1.2. And I cannot produce this issue with https://gist.github.com/arex-ebee/c7b4cd53f1f4fd26e70965d8f794d06d anymore. So, if you guys are still confused by this issue. It's very welcome that you can show your simplest reproduce code, which will help us a lot to go ahead.

If there is no issue on 5,1,2, a reply is also welcome so that we can give a note and get this issue finished,

matthew-8925 commented 7 months ago

Seems to be working for me on v5.1.2+, with v4.4 I it would lock up right away and had to use the workaround for it to work at all.

FYI, the exact fixes in tag/v5.1.2 are

wifi binary blob update Fixed Wi-Fi not working with temperature sensor on ESP32-S2

temperaturesensor* fixes Temperature Sensor: Fixed issue that if temperature sensor driver is disabled then phy can't work properly.

mcoracin commented 7 months ago

No more issue on v5.2.1 too, thanks !