Closed ptr727 closed 3 years ago
I connected an ESP-01 to the the TH10 UART port, installed esp-link on the ESP-01, so I can get remote UART access to the TH10 running Esphome.
I used socat to connect the esp-link to a tty port, and used the tty port as logging input for esphome.
socat -d pty,link=/dev/ttyESP,raw tcp:esp-link:23 &
esphome sonoff_th10_hot_recirc_pump.yaml logs --serial-port /dev/ttyESP
[20:42:51][D][sensor:092]: 'sonoff_th10_hot_recirc_pump_uptime': Sending state 6.83720 minutes with 0 decimals of accuracy
[20:43:04]
[20:43:04] ets Jan 8 2013,rst cause:4, boot mode:(3,5)
[20:43:04]
[20:43:04]wdt reset
[20:43:04]load 0x4010f000, len 3584, room 16
[20:43:04]tail 0
[20:43:04]chksum 0xb0
[20:43:04]csum 0xb0
[20:43:04]v2843a5ac
[20:43:04]~ld
[20:43:04][I][logger:166]: Log initialized
...
[20:43:14][D][debug:023]: ESPHome version 1.15.2
[20:43:14][D][debug:025]: Free Heap Size: 24568 bytes
[20:43:14][D][debug:053]: Flash Chip: Size=1024kB Speed=40MHz Mode=DOUT
[20:43:14][D][debug:190]: Chip ID: 0x00006BAB
[20:43:14][D][debug:191]: SDK Version: 2.2.2-dev(38a443e)
[20:43:14][D][debug:192]: Core Version: 2_7_4
[20:43:14][D][debug:193]: Boot Version=31 Mode=1
[20:43:14][D][debug:194]: CPU Frequency: 80
[20:43:14][D][debug:195]: Flash Chip ID=0x0014605E
[20:43:14][D][debug:196]: Reset Reason: Hardware Watchdog
[20:43:14][D][debug:197]: Reset Info: Fatal exception:4 flag:1 (Hardware Watchdog) epc1:0x40102ff5 epc2:0x00000000 epc3:0x00000000 excvaddr:0x00000000 depc:0x00000000
On a hardware watchdog reset no more useful information is printed. Per docs I found, a hardware watchdog reset will not print stack info.
Any further ideas on how to debug hardware watchdog resets in esphome?
The Esphome Sonoff docs says to use arduino_version: 2.4.2
, referencing an old closed issue with unsubstantiated claims.
I tried settings the arduino version to 2.4.2, it made no difference, same hardware watchdog error.
I've now switched to the docker "dev" tag, that seems to be using ioplatform v5, seeing if that makes a difference.
Adding logs captured using socat and telnet, there are 4 resets captured. logs.zip
I tried 1.16.0-dev, made it to about 2000 minutes before first reset :(
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still happens with 1.20 :( I have three TH10's as thermostats, units with single dallas sensor no problems, runs forever. The one with 4 dallas sensors keep watchdog restarting anywhere between a couple hours and days. My suspicion is still something with the dallas sensor code taking too long?
See related WDT reset and one-wire / dallas issue reported here: https://community.blynk.cc/t/sonoff-s20-esp8266-reboots-periodicly/26678/4 https://community.blynk.cc/t/prevent-crashes-and-resets-heartbeat-and-watchdog-timers/25708
I initially thought the code below would wait and could possibly exceed the watchdog time if executed serially, but in looking again it issues a conversion request on the bus then schedules the results to be read later.
https://github.com/esphome/esphome/blob/dev/esphome/components/dallas/dallas_component.cpp#L131
bool result;
{
InterruptLock lock;
if (!this->one_wire_->reset()) {
result = false;
} else {
result = true;
this->one_wire_->skip();
this->one_wire_->write8(DALLAS_COMMAND_START_CONVERSION);
}
}
...
for (auto *sensor : this->sensors_) {
this->set_timeout(sensor->get_address_name(), sensor->millis_to_wait_for_conversion(), [this, sensor] {
bool res;
{
InterruptLock lock;
res = sensor->read_scratch_pad();
}
...
float tempc = sensor->get_temp_c();
ESP_LOGD(TAG, "'%s': Got Temperature=%.1f°C", sensor->get_name().c_str(), tempc);
sensor->publish_state(tempc);
});
}
So the total time to read all sensors is ~ millis_to_wait_for_conversion (750ms) as they could all convert in parallel and results read serially.
Still seems suspicious that this happens with > 1 sensor only, maybe the issue lies elsewhere?
Operating environment/Installation (Hass.io/Docker/pip/etc.):
ESPHome Beta tag running in Docker.
ESP (ESP32/ESP8266, Board/Sonoff):
Sonoff TH10
ESPHome version (latest production, beta, dev branch)
Beta docker tag, 1.15.2
Affected component:
Unknown.
Description of problem: Random device restart due to hardware watchdog timer. Device is remotely installed, not currently possible to attach serial to debug. First restart observed after ~8000 minutes of uptime, second restart after ~2000 minutes of uptime.
Problem-relevant YAML-configuration entries: Unknown See: https://github.com/ptr727/ESPHome-Config/blob/master/sonoff_th10_hot_recirc_pump.yaml
Logs (if applicable):
Additional information and things you've tried: Added debug component. Attached and logged debug output over network.