arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
22.04k stars 4.78k forks source link

Unexpected Restarts on an ESP32 #15656

Closed mantielero closed 2 years ago

mantielero commented 2 years ago

PROBLEM DESCRIPTION

My ESP32 gets restarted many times a day.

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

- [X] If using rules, provide the output of this command: `Backlog Rule1; Rule2; Rule3`:
```lua
  Rules output here:
22:48:15.275 SRC: WebConsole from 192.168.0.207
22:48:15.279 CMD: Grp 0, Cmd 'BACKLOG', Idx 1, Len 19, Pld -99, Data 'Rule1; Rule2; Rule3'
22:48:15.308 SRC: Backlog
22:48:15.311 CMD: Grp 0, Cmd 'RULE', Idx 1, Len 0, Pld -99, Data ''
22:48:15.317 RUL: Rule1 ON ZbReceived#Interior#Temperature DO Var1 %value% ENDON      ON ZbReceived#Interior#Humidity DO Var2 %value% ENDON      ON ZbReceived#Interior#Pressure DO Var3 %value% ENDON        ON ZbReceived#Exterior#Temperature DO Var4 %value% ENDON      ON ZbReceived#Exterior#Humidity DO Var5 %value% ENDON       ON ZbReceived#Exterior#Pressure DO Var6 %value% ENDON   ON ZbReceived#Puerta#Contact DO Var7 %value% ENDON   ON ZbReceived#Puerta DO WebQuery
22:48:15.361 RUL: Rule1 +https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=Puerta&open=%var7% GET ENDON     ON ZBReceived#Interior DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=Tint&t=%var1%&h=%var2%&p=%var3% GET ENDON    ON ZBReceived#Exterior DO WebQuery
22:48:15.397 RUL: Rule1 +https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=Text&t=%var4%&h=%var5%&p=%var6% GET ENDON     ON Time#Initialized DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=logger&val=NTP GET ENDON
22:48:15.428 RSL: RESULT = {"Rule1":{"State":"ON","Once":"OFF","StopOnError":"OFF","Length":1164,"Free":120,"Rules":"ON ZbReceived#Interior#Temperature DO Var1 %value% ENDON      ON ZbReceived#Interior#Humidity DO Var2 %value% ENDON      ON ZbReceived#Interior#Pressure DO Var3 %value% ENDON        ON ZbReceived#Exterior#Temperature DO Var4 %value% ENDON      ON ZbReceived#Exterior#Humidity DO Var5 %value% ENDON       ON ZbReceived#Exterior#Pressure DO Var6 %value% ENDON   ON ZbReceived#Puerta#Contact DO Var7 %value% ENDON   ON ZbReceived#Puerta DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=Puerta&open=%var7% GET ENDON     ON ZBReceived#Interior DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=Tint&t=%var1%&h=%var2%&p=%var3% GET ENDON    ON ZBReceived#Exterior DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=Text&t=%var4%&h=%var5%&p=%var6% GET ENDON     ON Time#Initialized DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=logger&val=NTP GET ENDON"}}
22:48:15.607 SRC: Backlog
22:48:15.612 CMD: Grp 0, Cmd 'RULE', Idx 2, Len 0, Pld -99, Data ''
22:48:15.619 RSL: RESULT = {"Rule2":{"State":"ON","Once":"OFF","StopOnError":"OFF","Length":169,"Free":342,"Rules":"ON System#Boot DO WebQuery https://script.google.com/macros/s/XXXXXXXXXXXXX/exec?s=logger&val=BOOT GET ENDON"}}
22:48:15.834 SRC: Backlog
22:48:15.838 CMD: Grp 0, Cmd 'RULE', Idx 3, Len 0, Pld -99, Data ''
22:48:15.845 RSL: RESULT = {"Rule3":{"State":"OFF","Once":"OFF","StopOnError":"OFF","Length":0,"Free":511,"Rules":""}}
- [X] Set `weblog` to 4 and then, when you experience your issue, provide the output of the Console log:
```lua
  Console output here:
22:36:46.629 WIF: Probando conexi<0xc3><0xb3>n...␍␊
22:37:06.643 WIF: Probando conexi<0xc3><0xb3>n...␍␊
22:37:18.653 APP: Log serial deshabilitado␍␊
Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception). ␍␊
Debug exception reason: Stack canary watchpoint triggered (loopTask) ␍␊
Core  1 register dump:␍␊
PC      : 0x40092087  PS      : 0x00060036  A0      : 0x400907b3  A1      : 0x3ffb08c0  ␍␊
A2      : 0x3ffc9dfc  A3      : 0x3ffc9fdc  A4      : 0x00000001  A5      : 0x00000001  ␍␊
A6      : 0x3ffc9fe0  A7      : 0x00000001  A8      : 0x3ffc9e04  A9      : 0x80000001  ␍␊
A10     : 0x3ffb887c  A11     : 0x3ffb887c  A12     : 0x00000014  A13     : 0x00000004  ␍␊
A14     : 0x00000001  A15     : 0x80000002  SAR     : 0x00000006  EXCCAUSE: 0x00000001  ␍␊
EXCVADDR: 0x00000000  LBEG    : 0x400e0e9d  LEND    : 0x400e0eaf  LCOUNT  : 0x00000016  ␍␊
␍␊
␍␊
Backtrace:0x40092084:0x3ffb08c00x400907b0:0x3ffb08f0 0x40090760:0xa5a5a5a5  |<-CORRUPTED␍␊
␍␊
␍␊
␍␊
␍␊
ELF file SHA256: 0000000000000000␍␊
␍␊
Rebooting...␍␊
ets Jul 29 2019 12:21:46␍␊
␍␊
rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)␍␊
configsip: 0, SPIWP:0xee␍␊
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00␍␊
mode:DOUT, clock div:2␍␊
load:0x3fff0030,len:184␍␊
load:0x40078000,len:12596␍␊
load:0x40080400,len:2916␍␊
entry 0x400805c4␍␊
␍␊
00:00:00.002 HDW: ESP32-D0WD-V3 ␍␊
00:00:00.026 UFS: FlashFS mounted with 304 kB free␍␊
00:00:00.067 CFG: Loaded from File, Conteo 526␍␊
00:00:00.083 QPC: Count 1␍␊
00:00:00.085 CFG: CR 570/699, Busy 0␍␊
00:00:00.091 ROT: Mode 1␍␊
00:00:00.105 BRY: GC from 3699 to 3067 bytes, objects freed 5/22 (in 0 ms) - slots from 34/61 to 23/61␍␊
00:00:00.215 CFG: No '*.autoconf' file found␍␊
00:00:00.220 BRY: GC from 4462 to 3731 bytes, objects freed 5/38 (in 1 ms) - slots from 55/76 to 35/76␍␊
00:00:00.221 BRY: Berry initialized, RAM used=3731 bytes␍␊
00:00:00.247 BRY: No 'preinit.be'␍␊
00:00:00.249 ZIG: GPIOs Rx:16 Tx:17␍␊
00:00:00.253 SRC: Restart␍␊
00:00:00.259 Proyecto tasmota - Tasmota Versi<0xc3><0xb3>n 11.1.0.3(tasmota)-2_0_3(2022-05-18T21:26:02)␍␊
00:00:00.260 ETH: No ETH MDC and/or ETH MDIO GPIO defined␍␊
00:00:00.273 ZbInput forgiven first byte FF (only for statistics)␍␊
00:00:00.362 BRY: No 'autoexec.be'␍␊
00:00:00.471 ZbZNPReceived: received frame of wrong size FE, len 1, expected 5␍␊
00:00:00.500 WIF: Probando conexi<0xc3><0xb3>n...␍␊
00:00:00.502 WIF: Intentando conectar...␍␊
00:00:00.002 WIF: Connectando a AP1 inalambrika Channel 7 BSSId 08:62:66:3B:21:B8 en modo 11n como tasmota-D6D9DC-6620...␍␊
00:00:03.648 WIF: Probando conexi<0xc3><0xb3>n...␍␊
00:00:03.648 WIF: Conectado␍␊
00:00:03.860 HTP: Servidor web activo en tasmota-D6D9DC-6620 con direcci<0xc3><0xb3>n IP 192.168.0.22␍␊
00:00:03.864 RSL: INFO1 = {"Info1":{"Module":"ESP32-DevKit","Version":"11.1.0.3(tasmota)","FallbackTopic":"cmnd/DVES_D6D9DC_fb/","GroupTopic":"cmnd/tasmotas/"}}␍␊
00:00:03.905 RSL: INFO2 = {"Info2":{"WebServerMode":"Admin","Hostname":"tasmota-D6D9DC-6620","IPAddress":"192.168.0.22"}}␍␊
00:00:03.936 RSL: INFO3 = {"Info3":{"RestartReason":{"Exception":1,"Reason":"Syscall","EPC":"40092087","EXCVADDR":"00000000","CallChain":["40092084","400907b0","40090760"]},"BootCount":483}}␍␊
00:00:03.982 TFS: File 'autoexec.bat' not found␍␊
00:00:03.987 NTP: Sync time...␍␊
00:00:04.057 RTC: UTC 2022-05-18T20:47:38, DST 2022-03-27T02:00:00, STD 2022-10-30T03:00:00␍␊
22:47:38.000 RTC: Synced by NTP␍␊

TO REPRODUCE

Steps to reproduce the behavior: Not easy. I seen happening when I run WebQuery https://xxxx, but in above's log, that is not happening.

EXPECTED BEHAVIOUR

A clear and concise description of what you expected to happen. Not crashing.

SCREENSHOTS

If applicable, add screenshots to help explain your problem. N/A

ADDITIONAL CONTEXT

Add any other context about the problem here. N/A

(Please, remember to close the issue when the problem has been addressed)

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't any activity in last few weeks. It will be closed if no further activity occurs. Thank you for your contributions.

arendst commented 2 years ago

Do not run a solo1 image on a dual core processor.

Change your otaurl.

Jason2866 commented 2 years ago

Good catch! Zigbee and webquery is probably to much to handle for one core

mantielero commented 2 years ago

I don't know how to do that. Could you point me where should I configure to use both cores?

Jason2866 commented 2 years ago

Change the OtaUrl to by doing this command OtaUrl https://github.com/tasmota/install/raw/main/firmware/unofficial/tasmota32-zigbee.bin and than Upgrade 1

mantielero commented 2 years ago

Just out of curiosity, why do I have to update the OTA URL? I compiled a custom image myself with customized settings.

I thought that I would have to customize some particular setting in the way I compiled the image.

Jason2866 commented 2 years ago

So you configured something wrong in your custom setup. The OTA Url i linked is a verified working build for ESP32 with ZNP Zigbee stack. We can not provide support or solve issues for a custom compiled firmware.

mantielero commented 2 years ago

Flashing that image:

$ esptool.py --port /dev/serial/by-id/usb-1a86_USB_Single_Serial_539E009512-if00 write_flash -fm dout 0x0 tasmota32-zigbee.bin

produces all the time:

...
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
ets Jul 29 2019 12:21:46␍␊
␍␊
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
ets Jul 29 2019 12:21:46␍␊
␍␊
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
ets Jul 29 2019 12:21:46␍␊
␍␊
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
invalid header: 0x07000810␍␊
ets Jul 29 2019 12:21:46␍␊
␍␊
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)␍␊
invalid header: 0x0700081
...
sfromis commented 2 years ago

No, the URL suggested was meant for OTA. For wired flashing the way you try to do it, you need to use the .factory image (if you do not prefer a more complex flashing process with 4 files at different addresses). tasmota32-zigbee.factory.bin

mantielero commented 2 years ago

Thanks for that. Now testing if those restarts have disappeared.

Where can I find the configuration files used to create those images? I'd like to compare with what I did.

mantielero commented 2 years ago

It continues restarting:

11:24:39.083 RSL: RESULT = {"WebQuery":"Done"}
11:24:39.100 HTP: Web server active on tasmota-D6D9DC-6620 with IP address 192.168.0.22
11:24:39.105 RSL: INFO1 = {"Info1":{"Module":"ESP32-DevKit","Version":"12.0.2.2(zigbee)","FallbackTopic":"cmnd/DVES_D6D9DC_fb/","GroupTopic":"cmnd/tasmotas/"}}
11:24:39.147 RSL: INFO2 = {"Info2":{"WebServerMode":"Admin","Hostname":"tasmota-D6D9DC-6620","IPAddress":"192.168.0.22"}}
11:24:39.178 RSL: INFO3 = {"Info3":{"RestartReason":{"Exception":2,"Reason":"InstructionFetchError","EPC":"00000001","EXCVADDR":"ffffffe0","CallChain":["fffffffe"]},"BootCount":520}}
11:24:39.230 TFS: File 'autoexec.bat' not found
Jason2866 commented 2 years ago

Build config is here https://github.com/Jason2866/Tasmota-build/blob/special_builder/tasmota/user_config_override.h

I have no restarts with this build. Can you swap the Esp32 board and check your power supply.

mantielero commented 2 years ago

I don't have a spare ESP32 (I will buy a new one). Checking now with a different power supply.

On the other hand, what does this mean?

"Reason":"InstructionFetchError",
"EPC":"00000001",
"EXCVADDR":"ffffffe0",
"CallChain":["fffffffe"]
sfromis commented 2 years ago

Without a deeper analysis of surrounding details, those exceptions only means that it crashed due to some data handling not working. My speculative guess with those 4 lines would be something like a stack violation. Potential causes of that are very diverse, including issues with bad power among many other things.

Messages on serial when it crashes will often have more details.

mantielero commented 2 years ago

With another power supply the issue continues. As before, it is random, but tends to occur after using webquery. I use it connected to three sensors: two for temperature and one for door open/close sensor.

For instance, I got the "door opened" notification at 12:33pm in my mobile (so the webquery got executed) an inmediately after the board got restarted (I got the notification for that in my mobile too).

After that restart I think that the zigbee CC-2530 stops working:

12:33:18.065 RSL: RESULT = {"WebQuery":"Done"}
12:33:19.119 QPC: Reset
12:33:20.805 ZIG: {"ZbZNPReceived":"44810000060060080101008B005FAC5700000718120A0000100060081D"}
12:33:21.132 APP: Boot Count 523
12:33:21.143 RSL: STATE = {"Time":"2022-06-23T12:33:21","Uptime":"0T00:00:15","UptimeSec":15,"Heap":164,"SleepMode":"Dynamic","Sleep":50,"LoadAvg":51,"MqttCount":0,"Berry":{"HeapUsed":2,"Objects":28},"Wifi":{"AP":1,"SSId":"inalambrika","BSSId":"08:62:66:3B:21:B8","Channel":9,"Mode":"11n","RSSI":100,"Signal":-18,"LinkCount":1,"Downtime":"0T00:00:04"}}
12:33:21.201 RSL: SENSOR = {"Time":"2022-06-23T12:33:21","ESP32":{"Temperature":43.3},"TempUnit":"C"}
12:33:21.722 CFG: Saved, Count 575, Bytes 4096
12:33:23.059 ZIG: rebooting ZNP device
12:33:23.225 ZIG: ZbZNPSent 0xFF x 255
12:33:23.227 ZIG: ZbZNPSent 410000
12:33:28.210 ZIG: timeout, goto label 99
12:33:28.214 RSL: RESULT = {"ZbState":{"Status":99,"Message":"Abort"}}
12:33:28.242 ZIG: Abort
12:33:28.243 ZIG: Stopping (99)

If I go to Zigbee Map I get Zigbee not started

barbudor commented 2 years ago

Is it stable without WebQuery?

mantielero commented 2 years ago

Yes it is. It has been running for about 4 hours without failure (normally it fails within an hour). Zigbee is working fine too.

Something that puzzles me is that the issue appears sometimes after calling WebQuery. But the WebQuery command itself seems to work fine (given that I get the notification in my Google Sheet).

@arendst, just out of curiosity. What in the information provided makes you think I was running a solo1image? I checked how I compiled my custom image before and I run:

$ pio run -e tasmota32

not pio run -e tasmota32solo1.

arendst commented 2 years ago

As noted in the comment your OtaUrl is set to "OtaUrl":"http://ota.tasmota.com/tasmota32/release/tasmota32solo1.bin"

mantielero commented 2 years ago

That is funny too. I don't know from where that is coming from. I was checking my my_user_config.h and it shows:

// -- Ota -----------------------------------------
#ifdef ESP8266
#define OTA_URL                "http://ota.tasmota.com/tasmota/release/tasmota.bin.gz"  // [OtaUrl]
#endif  // ESP8266
#ifdef ESP32
#ifdef CONFIG_IDF_TARGET_ESP32C3
#define OTA_URL                "http://ota.tasmota.com/tasmota32/release/tasmota32c3.bin"  // [OtaUrl]
#elif defined(CONFIG_IDF_TARGET_ESP32S2)
#define OTA_URL                "no official version (yet)"  // [OtaUrl]
#elif defined(CONFIG_IDF_TARGET_ESP32S3)
#define OTA_URL                "no official version (yet)"  // [OtaUrl]
#elif defined(CORE32SOLO1)
#define OTA_URL                "http://ota.tasmota.com/tasmota32/release/tasmota32solo1.bin"  // [OtaUrl]
#else
#define OTA_URL                "http://ota.tasmota.com/tasmota32/release/tasmota32.bin"  // [OtaUrl]
#endif  //  CONFIG_IDF_TARGET_ESP32C3
#endif  // ESP32

So somewhere should be defining CORE32SOLO1. Using grep, I see boards/esp32_solo1_4M.json uses: -DCORE32SOLO1. I didn't set that in: user_config_override.h.

When I run pio run -e tasmota32, it looks like the board is correctly identified as esp32_4M:

rocessing tasmota32 (framework: arduino; platform: https://github.com/tasmota/platform-espressif32/releases/download/v.2.0.3/platform-espressif32-v.2.0.3.zip; board: esp32_4M)
--------------------------------------------------------------------------------
Verbose mode can be enabled via `-v, --verbose` option
CONFIGURATION: https://docs.platformio.org/page/boards/espressif32/esp32_4M.html
PLATFORM: Espressif 32 (2.0.3) > Espressif Generic ESP32 4M Flash, Tasmota 2880k Code/OTA, 320k FS
HARDWARE: ESP32 80MHz, 320KB RAM, 4MB Flash
DEBUG: Current (cmsis-dap) External (cmsis-dap, esp-prog, iot-bus-jtag, jlink, minimodule, olimex-arm-usb-ocd, olimex-arm-usb-ocd-h, olimex-arm-usb-tiny-h, olimex-jtag-tiny, tumpa)
PACKAGES: 
 - framework-arduinoespressif32 2.0.3 
 - tool-esptoolpy 1.30300.0 (3.3.0) 
 - tool-mklittlefs 1.203.210628 (2.3) 
 - tool-mkspiffs 2.230.0 (2.30) 
 - toolchain-esp32ulp 1.22851.191205 (2.28.51) 
 - toolchain-xtensa-esp32 8.4.0+2021r2-patch3
Converting tasmota.ino
LDF: Library Dependency Finder -> https://bit.ly/configure-pio-ldf
LDF Modes: Finder ~ chain, Compatibility ~ strict
Found 112 compatible libraries
Scanning dependencies...
...
Jason2866 commented 2 years ago

Probably, you "flashed over" the build. The settings from previous version are left intact. It is general a good idea to erase whole flash when flashing a different build and avoiding unwanted side effects from previous versions. This can be the problem you encounter!

mantielero commented 2 years ago

How can I do that?

mantielero commented 2 years ago

On the other hand, I am testing with Berry rather than WebQuery. It looks like it is behaving much more stable with what I am trying. No crash so far.

Jason2866 commented 2 years ago

See https://tasmota.github.io/docs/Esptool/

Jason2866 commented 2 years ago

Avoiding WebQuery is always a good idea!

sfromis commented 2 years ago

And using Ping is better than WebQuery too. Part of the story here is that http is not well-supported by Tasmota, it is only at a rudimentary level.

mantielero commented 2 years ago

I have replaced Rules+WebQuery with Berry language. So far it looks much better (I am really amazed). I will come back to you and let you know if the issues have gone finally.

mantielero commented 2 years ago

Just to let you know that it has been working for more than 24h without any issues. I think the issue is in WebQuery. I don't know if you want to keep the issue open or to close it. In any case, I would put a note in WebQuery's documentation warning about this behaviour.

klsc18 commented 1 year ago

@mantielero Hello mantielero, I have the same problems as you had at the beginning. Can you post your berry scripts for https so others like me can see working examples? Would be great. thanks

mantielero commented 1 year ago

I am sorry I missed this post.

What I do in order to read from Google Sheet, looks like:

cl = webclient()
cl.begin("https://sheets.googleapis.com/v4/spreadsheets/xxxxx")
cl.add_header("Authorization","Bearer xxxxxx")
#cl.add_header("Content-Type","application/json")
cl.GET()
print(cl.get_string())

and in order to POST:

cl = webclient()
cl.begin("https://sheets.googleapis.com/v4/spreadsheets/xxxxxx")
cl.add_header("Authorization","Bearer xxxxxxxx)
cl.add_header("Content-Type","application/json")
cl.POST("{\"values\": [[12345,6789]]}")