Crash on http_request.post

jckkk commented 2 years ago

The problem

The device crashes randomly when performing http_request. It seems that the watchdog triggers the restart.

It only occurs on ESP32 (tried on lolin32_lite, nodemcu-32s and mhetesp32minikit), but not ESP8266 (d1_mini).

Sample yaml reproduces the issue quite often, but not every time. It seems to be more consistent when there are some sensors attached.

Which version of ESPHome has the issue?

2021.12.1

What type of installation are you using?

pip

Which version of Home Assistant has the issue?

No response

What platform are you using?

ESP32

Board

lolin32_lite, nodemcu-32s, mhetesp32minikit

Component causing the issue

http_request

Example YAML snippet

esphome:
  platform: ESP32
  board: lolin32_lite
  name: crash

logger:
  level: DEBUG

status_led:
  pin: 2

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

ota:
  password: ""

mqtt:
  broker: 192.168.8.8

http_request:
  useragent: esphome/air-sensor
  timeout: 15s

script:
  - id: publish_to_sensor_community
    then:
      - logger.log:
          level: INFO
          format: "Publishing to sensor.community"
      - http_request.post:
          url: https://api.sensor.community/v1/push-sensor-data/
          headers:
            Content-Type: application/json
            X-Sensor: 12345678
            X-Pin: 1
          verify_ssl: false
          on_response:
            then:
              - logger.log:
                  level: INFO
                  format: "sensor.community response status: %d"
                  args:
                    - status_code
          json: |-
            root["software_version"] = 1;
            JsonArray &values = root.createNestedArray("sensordatavalues");
            JsonObject &value = values.createNestedObject();
            value["value_type"] = "P1";
            value["value"] = "4.0";

time:
  - platform: sntp
    timezone: Europe/Warsaw
    on_time:
      - seconds: /15
        then:
          - script.execute: publish_to_sensor_community
          - script.execute: publish_to_sensor_community

Anything in the logs that might be useful for us?

[21:16:45][I][app:062]: setup() finished successfully!
[21:16:45][I][app:102]: ESPHome version 2021.12.1 compiled on Dec 20 2021, 21:16:10
[21:16:45][I][main:259]: Publishing to sensor.community
[21:16:47][I][main:060]: sensor.community response status: 403
[21:16:47][W][http_request:083]: HTTP Request failed; URL: https://api.sensor.community/v1/push-sensor-data/; Code: 403
[21:16:47][I][main:259]: Publishing to sensor.community
[21:16:50]E (9832) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
[21:16:50]E (9832) task_wdt:  - loopTask (CPU 1)
[21:16:50]E (9832) task_wdt: Tasks currently running:
[21:16:50]E (9832) task_wdt: CPU 0: IDLE0
[21:16:50]E (9832) task_wdt: CPU 1: IDLE1
[21:16:50]E (9832) task_wdt: Aborting.
[21:16:50]abort() was called at PC 0x40147550 on core 0
[21:16:50]
[21:16:50]ELF file SHA256: 0000000000000000
[21:16:50]
[21:16:50]Backtrace: 0x40088bd4:0x3ffbf860 0x40088e51:0x3ffbf880 0x40147550:0x3ffbf8a0 0x400872c9:0x3ffbf8c0 0x4017cd8b:0x3ffbc160 0x40148e5b:0x3ffbc180 0x4008b611:0x3ffbc1a0 0x40089e62:0x3ffbc1c0
WARNING Found stack trace! Trying to decode it
WARNING Decoded 0x40088bd4: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
WARNING Decoded 0x40088e51: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
WARNING Decoded 0x40147550: task_wdt_isr at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/task_wdt.c:252
WARNING Decoded 0x400872c9: _xt_lowint1 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/xtensa_vectors.S:1154
WARNING Decoded 0x4017cd8b: esp_pm_impl_waiti at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/pm_esp32.c:492
WARNING Decoded 0x40148e5b: esp_vApplicationIdleHook at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/freertos_hooks.c:108
WARNING Decoded 0x4008b611: prvIdleTask at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:3507
WARNING Decoded 0x40089e62: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
[21:16:50]
[21:16:50]Rebooting...
[21:16:50]ets Jun  8 2016 00:22:57
[21:16:50]
[21:16:50]rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[21:16:50]configsip: 0, SPIWP:0xee
[21:16:50]clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[21:16:50]mode:DIO, clock div:2
[21:16:50]load:0x3fff0018,len:4
[21:16:50]load:0x3fff001c,len:1044
[21:16:50]load:0x40078000,len:10124
[21:16:50]load:0x40080400,len:5828
[21:16:50]entry 0x400806a8
[21:16:50][E][esp32-hal-misc.c:111] disableCore1WDT(): Failed to remove Core 1 IDLE task from WDT
[21:16:50][I][logger:214]: Log initialized

Additional information

No response

jckkk commented 2 years ago

Did some more testing and it appears that esphome will crash every time if the response is not provided within around 4 seconds. The below will consistently crash if the response is provided after 5 seconds. Apparently, the watchdog timer is not being reset during waiting. Basically, this component is hardly usable on ESP32 due to this issue.

esphome:
  platform: ESP32
  board: nodemcu-32s
  name: temp-display

logger:
  level: DEBUG

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

http_request:
  useragent: esphome/crashing-sensor
  timeout: 15s

interval:
  interval: 15s
  then:
    - logger.log:
        level: INFO
        format: "Requesting"
    - http_request.get:
        url: http://192.168.8.113:8080/wait
        headers:
          Content-Type: application/json

krugerrc1 commented 2 years ago

I'm experiencing the same issue on ESPHome 2022.4.0. Any way to get around this?

jckkk commented 2 years ago

I'm experiencing the same issue on ESPHome 2022.4.0. Any way to get around this?

None that I'm aware of. This component is basically useless in this state.

This seems to be a general issue with watchdog on ESP32. There are quite a few other open issues where WDT triggers restarts when too much is going on.

TheEris commented 2 years ago

A similar issue here, I'm using HTTP Request as a call-home feature to trigger a webhook on a remote server, so I know ESP is still alive. It worked witout problem for about 2-3 Months. Today my server is down because of maitanence and ESP32 is crashing every 5 min (which is also the interval they send out POST to my webhook)

In short, ESP32 crashes when the server is unavailable

wrobell commented 2 years ago

Could it be because http_request component is making synchronous requests? If so, then probably 5 second timeout is the maximum value allowed for now? Could be the component re-implemented using asynchronous calls (https://www.arduino.cc/reference/en/libraries/asynchttprequest_generic/)?

behnam85 commented 2 years ago

+1

jpdias commented 2 years ago

+1

richardjm commented 2 years ago

Same issue here. Was thinking it was something I'd done all day.

ESPHome Version: 2022.10.0

[D][sensor:127]: 'Bottom Temperature': Sending state 18.31250 °C with 1 decimals of accuracy
[D][dallas.sensor:143]: 'Top Temperature': Got Temperature=18.9°C
[D][sensor:127]: 'Top Temperature': Sending state 18.93750 °C with 1 decimals of accuracy
[D][dallas.sensor:143]: 'Bottom Temperature': Got Temperature=18.3°C
[D][sensor:127]: 'Bottom Temperature': Sending state 18.31250 °C with 1 decimals of accuracy
[I][main:098]: Updating top: 18.9 bottom: 18.3
E (299134) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (299134) task_wdt:  - loopTask (CPU 1)
E (299134) task_wdt: Tasks currently running:
E (299134) task_wdt: CPU 0: IDLE0
E (299134) task_wdt: CPU 1: loopTask
E (299134) task_wdt: Aborting.
abort() was called at PC 0x40156a78 on core 0

ELF file SHA256: 0000000000000000

Backtrace: 0x40088ce8:0x3ffbf820 0x40088f65:0x3ffbf840 0x40156a78:0x3ffbf860 0x400873cd:0x3ffbf880 0x4018b897:0x3ffbc160 0x40158383:0x3ffbc180 0x4008b725:0x3ffbc1a0 0x40089f76:0x3ffbc1c0

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8
[I][logger:243]: Log initialized
[C][ota:465]: There have been 1 suspected unsuccessful boot attempts.
[D][esp32.preferences:113]: Saving 1 preferences to flash...
[D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[I][app:029]: Running through setup()...

Example snippet from yaml

# Make periodic update to google
time:
  - platform: sntp
    id: sntp_time
    on_time:
      # Every 5 minutes
      - seconds: 0
        minutes: /5
        then:
          - if:
              condition:
                lambda: 'return id(top_temp).has_state() && id(bottom_temp).has_state();'
              then:
              - output.turn_on: status_led
              - logger.log:
                  level: INFO
                  format: "Updating top: %.1f bottom: %.1f"
                  args: [ 'id(top_temp).state', 'id(bottom_temp).state' ]
              - http_request.get:
                  verify_ssl: false
                  url: !lambda |-
                    char buf[256];
                    sprintf(buf, "https://script.google.com/macros/s/DeplymentID/exec?top=%.1f&bottom=%.1f", id(top_temp).state, id(bottom_temp).state);
                    return ((std::string) buf).c_str();
                  on_response:
                    then:
                      - wait_until:
                          condition:
                            wifi.connected:
                          timeout: 45s
                      - logger.log:
                          format: "Response status: %d"
                          args:
                            - status_code
              - output.turn_off: status_led

dmsergeevich63 commented 2 years ago

I get such a non-permanent error when I try to control my relay from the esp 8266 board. At the same time, the request from the PC works. Has anyone found a solution? How do we bring this problem to the developers? My version of esp home v2022.10.0, it may be worth updating

stas-sl commented 1 year ago

Seems, like the default CONFIG_ESP_TASK_WDT_TIMEOUT_S is 5 seconds, which equals to the default http_request component timeout, so if request takes longer, it will be killed by watchdog. As a workaround you might try either:

Decrease http_request timeout:
```
http_request:
timeout: 1s
```
Increase watchdog timeout or disable panicking somewhere in your code (not sure what's the best place):
```
#include <esp_task_wdt.h>
esp_task_wdt_init(15 /* timeout */, false /* panic */);
```

A better solution would be of course using some async http library, maybe just native ESP-IDF's http client will be a good alternative.

Yourigh commented 1 year ago

Hi, I have the same problem. I get watchdog induced restarts when the timeout is larger than 1s.

I am trying to send data to google sheet. It works, but I do not get a response from HTTP GET due to timeout.

Is there any workaround for this? Thanks.

Perhaps a watchdog definition in on_boot automation?

zoic21 commented 1 year ago

Hello,

Any news on that ? I also got this issue on ESP32 (event without 1s timeout). I don't have this on ESP8266. This bug make http component inusable on ESP32.

Since this bug has been there for a year, I would like this component to be marked as incompatible with ESP32

fxstein commented 1 year ago

Same issue here. I have been debugging for days because pretty much any http_request fails and crashes the ESP until I finally found this issue.

adorobis commented 1 year ago

Same here. This issue basically makes it almost impossible to use. It works for a few minutes and crashes again. I'm using it to call an API of a device in the same network, response is always very fast (under 1 second).

jowgn commented 1 year ago

Seems, like the default CONFIG_ESP_TASK_WDT_TIMEOUT_S is 5 seconds, which equals to the default http_request component timeout, so if request takes longer, it will be killed by watchdog. As a workaround you might try either:

Decrease http_request timeout:
http_request:
  timeout: 1s
Increase watchdog timeout or disable panicking somewhere in your code (not sure what's the best place):
#include <esp_task_wdt.h>
esp_task_wdt_init(15 /* timeout */, false /* panic */);
A better solution would be of course using some async http library, maybe just native ESP-IDF's http client will be a good alternative.

Has anyone implemented the second workaround?

LewisSpring commented 1 year ago

Just adding my two pence, I created issue #4569 without noticing this issue (my apologies)

I have tried a lot of combinations of ESPHome versions and have had no luck. Going to have a look at @stas-sl 's suggestions. Cheers

LewisSpring commented 1 year ago

Just tried reducing the Timeout to 4 seconds, still crashing.

I tried the C++ fix, by creating a file in the configuration directory named fixtimeout.h and adding

esphome:
[...]
  includes:
    - fixtimeout.h

to my configuration Unfortunately I get this error:

Compiling .pioenvs/test/src/main.cpp.o
In file included from src/main.cpp:199:
src/fixtimeout.h:2:18: error: expected constructor, destructor, or type conversion before '(' token
 esp_task_wdt_init(15 /* timeout */, false /* panic */);
                  ^
*** [.pioenvs/test/src/main.cpp.o] Error 1
====================== [FAILED] Took 270.71 seconds ======================

I'd be interested in getting the C++ solution working if possible.

stas-sl commented 1 year ago

4 seconds might be still too big, there are actually 2 timeouts (connect timeout and receive, both of which set to 4 seconds, if I remember correctly, so in worst case full request might take 8 seconds, which is still bigger than 5 seconds. In many cases setting as low as 1 second could work if you don't need to analyze response status.

You can't call a function from a header file, you should include header and call esp_task_wdt_init from 'on_boot' handler, see comment.

LewisSpring commented 1 year ago

4 seconds might be still too big, there are actually 2 timeouts (connect timeout and receive, both of which set to 4 seconds, if I remember correctly, so in worst case full request might take 8 seconds, which is still bigger than 5 seconds. In many cases setting as low as 1 second could work if you don't need to analyze response status.

You can't call a function from a header file, you should include header and call esp_task_wdt_init from 'on_boot' handler, see comment.

I tried 1 second timeout and still getting crashes. Was getting connection refused instead of a 200 response if that makes any difference.

I also tried https://github.com/epiclabs-uc/esphome-nowatchdog-component, but the ESP still crashes unfortunately.

dwainscheeren commented 1 year ago

I'm also getting crashed but only when using ethernet withLAN8720 on an esp32 but when I use wifi its working without problems.

Diondk commented 1 year ago

i also have crashes when doing an http request to dsmr. Hopefully this can be fixed soon and i can power my p1 meter through poe and get rid of the wifi connection.

LewisSpring commented 1 year ago

i also have crashes when doing an http request to dsmr. Hopefully this can be fixed soon and i can power my p1 meter through poe and get rid of the wifi connection.

FYI, My devices are using POE Ethernet and it doesn't resolve the issue in my situation.

Diondk commented 1 year ago

i also have crashes when doing an http request to dsmr. Hopefully this can be fixed soon and i can power my p1 meter through poe and get rid of the wifi connection.

FYI, My devices are using POE Ethernet and it doesn't resolve the issue in my situation.

it works for me when i use the wifi connection, as soon as i enable POE ethernet i have crashes and i get failures back for the post requests.

LewisSpring commented 1 year ago

i also have crashes when doing an http request to dsmr. Hopefully this can be fixed soon and i can power my p1 meter through poe and get rid of the wifi connection.

FYI, My devices are using POE Ethernet and it doesn't resolve the issue in my situation.

it works for me when i use the wifi connection, as soon as i enable POE ethernet i have crashes and i get failures back for the post requests.

That's interesting!

submac commented 1 year ago

I experience exactly the same problem when making HTTP request from one node to another via eth on esp32. I have smart home modules with exposed API and this pretty much prevents any direct communication between them. changing timeouts doesn't help.

Can we mark it as a bug? would anyone be able to pick it up?

MallocArray commented 10 months ago

I believe I am also having this issue, with the watchdog resetting my device and doing http_request.post every 2.5 minutes, usually within 600 ms, but sometimes seeing up to 3500ms

[16:49:03][D][http_request:099]: HTTP Request completed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Code: 200; Duration: 1357 ms
[16:51:32][W][http_request:086]: HTTP Request failed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Error: connection lost; Duration: 271 ms
[16:54:03][D][http_request:099]: HTTP Request completed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Code: 200; Duration: 581 ms
[16:56:33][D][http_request:099]: HTTP Request completed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Code: 200; Duration: 557 ms
[16:59:03][D][http_request:099]: HTTP Request completed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Code: 200; Duration: 560 ms
[17:01:33][D][http_request:099]: HTTP Request completed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Code: 200; Duration: 568 ms
[17:04:03][D][http_request:099]: HTTP Request completed; URL: http://hw.airgradient.com/sensors/airgradient:xxxxxx/measures; Code: 200; Duration: 558 ms
[17:04:48]
[17:04:48] ets Jan  8 2013,rst cause:4, boot mode:(3,7)
[17:04:48]
[17:04:48]wdt reset
[17:04:48]load 0x4010f000, len 3460, room 16 
[17:04:48]tail 4
[17:04:48]chksum 0xcc
[17:04:48]load 0x3fff20b8, len 40, room 4 
[17:04:48]tail 4
[17:04:48]chksum 0xc9
[17:04:48]csum 0xc9
[17:04:48]v000984e0
[17:04:48]~ld
[17:04:48][I][logger:351]: Log initialized
[17:04:48][C][ota:473]: There have been 0 suspected unsuccessful boot attempts.
[17:04:48][I][app:029]: Running through setup()...

sarabveer commented 9 months ago

I am seeing a similar issue with my Airgradient using an ESP32. If there is no internet connection, the request fails and ESP crashes. Since the airgradient esphome config also sends a HTTP POST on boot, sometimes it even fails to boot and gets stuck in a bootloop.

jckkk commented 9 months ago

Here's a workaround based on @stas-sl suggestion. Thank you @stas-sl!

esp32_wdt.h:

#include <esp_task_wdt.h>

void set_esp32_wdt_timeout() {
    esp_task_wdt_init(60 /* timeout [s] */, true /* panic */);
}

esphome:
  includes:
    - esp32_wdt.h
  on_boot:
    then:
      - lambda: |-
          set_esp32_wdt_timeout();

ex-nerd commented 6 months ago

esp32_wdt.h:

So far this has worked great for my esp32 devices. Any chance you know of a similar workaround for esp8266?

esphome / issues