Sonoff TH10 Reset Info: Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40102ff5

ptr727 commented 4 years ago

Operating environment/Installation (Hass.io/Docker/pip/etc.):

ESPHome Beta tag running in Docker.

ESP (ESP32/ESP8266, Board/Sonoff):

Sonoff TH10

ESPHome version (latest production, beta, dev branch)

Beta docker tag, 1.15.2

Affected component:

Unknown.

Description of problem: Random device restart due to hardware watchdog timer. Device is remotely installed, not currently possible to attach serial to debug. First restart observed after ~8000 minutes of uptime, second restart after ~2000 minutes of uptime.

Problem-relevant YAML-configuration entries: Unknown See: https://github.com/ptr727/ESPHome-Config/blob/master/sonoff_th10_hot_recirc_pump.yaml

Logs (if applicable):

[00:16:07][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_recirc_temp': Got Temperature=32.4°C
[00:16:07][D][sensor:092]: 'sonoff_th10_hot_recirc_pump_recirc_temp': Sending state 32.41667 °C with 1 decimals of accuracy
[00:16:07][D][climate:262]: 'sonoff_th10_hot_recirc_pump_bangbang' - Sending state:
[00:16:07][D][climate:265]:   Mode: AUTO
[00:16:07][D][climate:267]:   Action: IDLE
[00:16:07][D][climate:276]:   Current Temperature: 32.42°C
[00:16:07][D][climate:280]:   Target Temperature: Low: 31.11°C High: 40.56°C
[00:16:07][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_hot_temp': Got Temperature=37.1°C
[00:16:07][D][sensor:092]: 'sonoff_th10_hot_recirc_pump_hot_temp': Sending state 37.06250 °C with 1 decimals of accuracy
[00:16:07][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_cold_temp': Got Temperature=22.7°C
[00:16:07][D][sensor:092]: 'sonoff_th10_hot_recirc_pump_cold_temp': Sending state 22.68750 °C with 1 decimals of accuracy
[00:16:07][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_ambient_temp': Got Temperature=25.3°C
[00:16:07][D][sensor:092]: 'sonoff_th10_hot_recirc_pump_ambient_temp': Sending state 25.31250 °C with 1 decimals of accuracy
[00:16:12][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_recirc_temp': Got Temperature=32.3°C
[00:16:12][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_hot_temp': Got Temperature=37.0°C
[00:16:12][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_cold_temp': Got Temperature=22.7°C
[00:16:12][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_ambient_temp': Got Temperature=25.3°C
[00:16:35][C][logger:185]: Logger:
[00:16:35][C][logger:186]:   Level: DEBUG
[00:16:35][C][logger:187]:   Log Baud Rate: 115200
[00:16:35][C][logger:188]:   Hardware UART: UART0
[00:16:35][C][restart:021]: Restart Switch 'sonoff_th10_hot_recirc_pump_restart'
[00:16:35][C][restart:021]:   Icon: 'mdi:restart'
[00:16:35][C][status:034]: Status Binary Sensor 'sonoff_th10_hot_recirc_pump_status'
[00:16:35][C][status:034]:   Device Class: 'connectivity'
[00:16:35][C][bang_bang.climate:161]: Bang Bang Climate 'sonoff_th10_hot_recirc_pump_bangbang'
[00:16:35][C][bang_bang.climate:162]:   Supports HEAT: YES
[00:16:35][C][bang_bang.climate:163]:   Supports COOL: NO
[00:16:35][C][bang_bang.climate:164]:   Supports AWAY mode: NO
[00:16:35][C][bang_bang.climate:165]:   Default Target Temperature Low: 30.0°C
[00:16:35][C][bang_bang.climate:166]:   Default Target Temperature High: 40.0°C
[00:16:35][C][version.text_sensor:021]: Version Text Sensor 'sonoff_th10_hot_recirc_pump_version'
[00:16:35][C][version.text_sensor:021]:   Icon: 'mdi:new-box'
[00:16:35][C][homeassistant.time:010]: Home Assistant Time:
[00:16:35][C][homeassistant.time:011]:   Timezone: 'PST8PDT7,M3.2.0/2,M11.1.0/2'
[00:16:35][C][dallas.sensor:072]: DallasComponent:
[00:16:35][C][dallas.sensor:073]:   Pin: GPIO14 (Mode: INPUT)
[00:16:35][C][dallas.sensor:074]:   Update Interval: 5.0s
[00:16:35][D][dallas.sensor:079]:   Found sensors:
[00:16:35][D][dallas.sensor:082]:     0x05011455296AAA28
[00:16:35][D][dallas.sensor:082]:     0xEC011454E099AA28
[00:16:35][D][dallas.sensor:082]:     0x5E01145529F7AA28
[00:16:35][D][dallas.sensor:082]:     0x54011455087FAA28
[00:16:35][C][dallas.sensor:087]:   Device 'sonoff_th10_hot_recirc_pump_recirc_temp'
[00:16:35][C][dallas.sensor:087]:     Unit of Measurement: '°C'
[00:16:35][C][dallas.sensor:087]:     Accuracy Decimals: 1
[00:16:35][C][dallas.sensor:087]:     Icon: 'mdi:thermometer'
[00:16:35][C][dallas.sensor:095]:     Address: 0x05011455296AAA28
[00:16:35][C][dallas.sensor:096]:     Resolution: 12
[00:16:35][C][dallas.sensor:087]:   Device 'sonoff_th10_hot_recirc_pump_hot_temp'
[00:16:35][C][dallas.sensor:087]:     Unit of Measurement: '°C'
[00:16:35][C][dallas.sensor:087]:     Accuracy Decimals: 1
[00:16:35][C][dallas.sensor:087]:     Icon: 'mdi:thermometer'
[00:16:35][C][dallas.sensor:095]:     Address: 0x54011455087FAA28
[00:16:35][C][dallas.sensor:096]:     Resolution: 12
[00:16:35][C][dallas.sensor:087]:   Device 'sonoff_th10_hot_recirc_pump_cold_temp'
[00:16:36][C][dallas.sensor:087]:     Unit of Measurement: '°C'
[00:16:36][C][dallas.sensor:087]:     Accuracy Decimals: 1
[00:16:36][C][dallas.sensor:087]:     Icon: 'mdi:thermometer'
[00:16:36][C][dallas.sensor:095]:     Address: 0x5E01145529F7AA28
[00:16:36][C][dallas.sensor:096]:     Resolution: 12
[00:16:36][C][dallas.sensor:087]:   Device 'sonoff_th10_hot_recirc_pump_ambient_temp'
[00:16:36][C][dallas.sensor:087]:     Unit of Measurement: '°C'
[00:16:36][C][dallas.sensor:087]:     Accuracy Decimals: 1
[00:16:36][C][dallas.sensor:087]:     Icon: 'mdi:thermometer'
[00:16:36][C][dallas.sensor:095]:     Address: 0xEC011454E099AA28
[00:16:36][C][dallas.sensor:096]:     Resolution: 12
[00:16:36][C][captive_portal:169]: Captive Portal:
[00:16:36][C][ota:029]: Over-The-Air Updates:
[00:16:36][C][ota:030]:   Address: sonoff_th10_hot_recirc_pump.home.insanegenius.net:8266
[00:16:36][C][ota:032]:   Using Password.
[00:16:36][C][api:095]: API Server:
[00:16:36][C][api:096]:   Address: sonoff_th10_hot_recirc_pump.home.insanegenius.net:6053
[00:16:36][C][wifi_signal.sensor:009]: WiFi Signal 'sonoff_th10_hot_recirc_pump_wifi_signal'
[00:16:36][C][wifi_signal.sensor:009]:   Unit of Measurement: 'dB'
[00:16:36][C][wifi_signal.sensor:009]:   Accuracy Decimals: 0
[00:16:36][C][wifi_signal.sensor:009]:   Icon: 'mdi:wifi'
[00:16:36][C][wifi_info:009]: WifiInfo IPAddress 'sonoff_th10_hot_recirc_pump_wifi_ip'
[00:16:36][C][wifi_info:010]: WifiInfo SSID 'sonoff_th10_hot_recirc_pump_wifi_ssid'
[00:16:36][C][wifi_info:011]: WifiInfo BSSID 'sonoff_th10_hot_recirc_pump_wifi_bssid'
[00:16:36][D][debug:023]: ESPHome version 1.15.2
[00:16:36][D][debug:025]: Free Heap Size: 21984 bytes
[00:16:36][D][debug:053]: Flash Chip: Size=1024kB Speed=40MHz Mode=DOUT
[00:16:36][D][debug:190]: Chip ID: 0x00006BAB
[00:16:36][D][debug:191]: SDK Version: 2.2.2-dev(38a443e)
[00:16:36][D][debug:192]: Core Version: 2_7_4
[00:16:36][D][debug:193]: Boot Version=31 Mode=1
[00:16:36][D][debug:194]: CPU Frequency: 80
[00:16:36][D][debug:195]: Flash Chip ID=0x0014605E
[00:16:36][D][debug:196]: Reset Reason: Hardware Watchdog
[00:16:36][D][debug:197]: Reset Info: Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40102ff5 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000 depc:0x00000000
[00:16:39][D][api.connection:604]: Client 'Home Assistant 0.116.2 (192.168.1.26)' connected successfully!
[00:16:39][D][time:040]: Synchronized time: Sun Oct 11 17:16:39 2020
[00:16:40][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_recirc_temp': Got Temperature=35.4°C
[00:16:40][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_hot_temp': Got Temperature=38.7°C
[00:16:40][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_cold_temp': Got Temperature=22.7°C
[00:16:40][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_ambient_temp': Got Temperature=25.2°C
[00:16:45][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_recirc_temp': Got Temperature=36.8°C
[00:16:45][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_hot_temp': Got Temperature=40.6°C
[00:16:45][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_cold_temp': Got Temperature=22.7°C
[00:16:45][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_ambient_temp': Got Temperature=25.2°C
[00:16:50][D][dallas.sensor:153]: 'sonoff_th10_hot_recirc_pump_recirc_temp': Got Temperature=38.1°C

Additional information and things you've tried: Added debug component. Attached and logged debug output over network.

[00:16:36][D][debug:023]: ESPHome version 1.15.2
[00:16:36][D][debug:025]: Free Heap Size: 21984 bytes
[00:16:36][D][debug:053]: Flash Chip: Size=1024kB Speed=40MHz Mode=DOUT
[00:16:36][D][debug:190]: Chip ID: 0x00006BAB
[00:16:36][D][debug:191]: SDK Version: 2.2.2-dev(38a443e)
[00:16:36][D][debug:192]: Core Version: 2_7_4
[00:16:36][D][debug:193]: Boot Version=31 Mode=1
[00:16:36][D][debug:194]: CPU Frequency: 80
[00:16:36][D][debug:195]: Flash Chip ID=0x0014605E
[00:16:36][D][debug:196]: Reset Reason: Hardware Watchdog
[00:16:36][D][debug:197]: Reset Info: Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40102ff5 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000 depc:0x00000000

ptr727 commented 4 years ago

I connected an ESP-01 to the the TH10 UART port, installed esp-link on the ESP-01, so I can get remote UART access to the TH10 running Esphome.

I used socat to connect the esp-link to a tty port, and used the tty port as logging input for esphome.

socat -d pty,link=/dev/ttyESP,raw tcp:esp-link:23 &
esphome sonoff_th10_hot_recirc_pump.yaml logs --serial-port /dev/ttyESP

[20:42:51][D][sensor:092]: 'sonoff_th10_hot_recirc_pump_uptime': Sending state 6.83720 minutes with 0 decimals of accuracy
[20:43:04]
[20:43:04] ets Jan  8 2013,rst cause:4, boot mode:(3,5)
[20:43:04]
[20:43:04]wdt reset
[20:43:04]load 0x4010f000, len 3584, room 16 
[20:43:04]tail 0
[20:43:04]chksum 0xb0
[20:43:04]csum 0xb0
[20:43:04]v2843a5ac
[20:43:04]~ld
[20:43:04][I][logger:166]: Log initialized
...
[20:43:14][D][debug:023]: ESPHome version 1.15.2
[20:43:14][D][debug:025]: Free Heap Size: 24568 bytes
[20:43:14][D][debug:053]: Flash Chip: Size=1024kB Speed=40MHz Mode=DOUT
[20:43:14][D][debug:190]: Chip ID: 0x00006BAB
[20:43:14][D][debug:191]: SDK Version: 2.2.2-dev(38a443e)
[20:43:14][D][debug:192]: Core Version: 2_7_4
[20:43:14][D][debug:193]: Boot Version=31 Mode=1
[20:43:14][D][debug:194]: CPU Frequency: 80
[20:43:14][D][debug:195]: Flash Chip ID=0x0014605E
[20:43:14][D][debug:196]: Reset Reason: Hardware Watchdog
[20:43:14][D][debug:197]: Reset Info: Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40102ff5 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000 depc:0x00000000

On a hardware watchdog reset no more useful information is printed. Per docs I found, a hardware watchdog reset will not print stack info.

Any further ideas on how to debug hardware watchdog resets in esphome?

ptr727 commented 4 years ago

The Esphome Sonoff docs says to use arduino_version: 2.4.2, referencing an old closed issue with unsubstantiated claims. I tried settings the arduino version to 2.4.2, it made no difference, same hardware watchdog error.

I've now switched to the docker "dev" tag, that seems to be using ioplatform v5, seeing if that makes a difference.

Adding logs captured using socat and telnet, there are 4 resets captured. logs.zip

ptr727 commented 4 years ago

I tried 1.16.0-dev, made it to about 2000 minutes before first reset :(

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ptr727 commented 3 years ago

Still happens with 1.20 :( I have three TH10's as thermostats, units with single dallas sensor no problems, runs forever. The one with 4 dallas sensors keep watchdog restarting anywhere between a couple hours and days. My suspicion is still something with the dallas sensor code taking too long?

I initially thought the code below would wait and could possibly exceed the watchdog time if executed serially, but in looking again it issues a conversion request on the bus then schedules the results to be read later.

https://github.com/esphome/esphome/blob/dev/esphome/components/dallas/dallas_component.cpp#L131

  bool result;
  {
    InterruptLock lock;
    if (!this->one_wire_->reset()) {
      result = false;
    } else {
      result = true;
      this->one_wire_->skip();
      this->one_wire_->write8(DALLAS_COMMAND_START_CONVERSION);
    }
  }
...
for (auto *sensor : this->sensors_) {
    this->set_timeout(sensor->get_address_name(), sensor->millis_to_wait_for_conversion(), [this, sensor] {
      bool res;
      {
        InterruptLock lock;
        res = sensor->read_scratch_pad();
      }
...
      float tempc = sensor->get_temp_c();
      ESP_LOGD(TAG, "'%s': Got Temperature=%.1f°C", sensor->get_name().c_str(), tempc);
      sensor->publish_state(tempc);
    });
  }

So the total time to read all sensors is ~ millis_to_wait_for_conversion (750ms) as they could all convert in parallel and results read serially.

Still seems suspicious that this happens with > 1 sensor only, maybe the issue lies elsewhere?

esphome / issues

Sonoff TH10 Reset Info: Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40102ff5 #1548