arendst / Tasmota

Alternative firmware for ESP8266 and ESP32 based devices with easy configuration using webUI, OTA updates, automation using timers or rules, expandability and entirely local control over MQTT, HTTP, Serial or KNX. Full documentation at
https://tasmota.github.io/docs
GNU General Public License v3.0
21.71k stars 4.72k forks source link

SD Card access causes crash in development branch, but not in release branch (CAUSED BY WATCHDOG - see below) #21473

Closed btsimonh closed 1 month ago

btsimonh commented 1 month ago

PROBLEM DESCRIPTION

I was experiencing random crashes on esp32cam writing and reading from SD card.

I found a way to reliably reproduce the crash on esp32cam and esp32s3cam boards.

Further testing on the S3 board only indicates that I did not see a crash with the release version (tasmota32s3.bin | http://ota.tasmota.com/tasmota32/release/tasmota32s3.bin | 1956k | 20240514 16:12)

but does crash reliably with the development version: (tasmota32s3.bin | http://ota.tasmota.com/tasmota32/tasmota32s3.bin | 1973k | 20240522 17:19

Crashes happen on writing files, listing folders and reading files, in unmodified firmware.

I have not done equivalent testing of internal flash.

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

- [x] Provide the output of this command: `Status 0`:
```lua
  STATUS 0 output here:
12:49:36.554 CMD: status 0
12:49:36.555 SRC: WebConsole from 192.168.1.174
12:49:36.556 CMD: Grp 0, Cmd 'STATUS', Idx 1, Len 1, Pld 0, Data '0'
12:49:36.559 RSL: STATUS = {"Status":{"Module":0,"DeviceName":"Tasmota","FriendlyName":["Tasmota"],"Topic":"tasmota_9CE64C","ButtonTopic":"0","Power":"0","PowerLock":"0","PowerOnState":3,"LedState":1,"LedMask":"FFFF","SaveData":1,"SaveState":1,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0,"StatusRetain":0}}
12:49:36.565 RSL: STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota32/tasmota32s3.bin","RestartReason":"Software reset CPU","Uptime":"0T00:12:55","StartupUTC":"2024-05-23T11:36:41","Sleep":1,"CfgHolder":4617,"BootCount":158,"BCResetTime":"2023-08-14T12:16:33","SaveCount":400}}
12:49:36.572 RSL: STATUS2 = {"StatusFWR":{"Version":"14.0.0.1(522f6c5-tasmota32)","BuildDateTime":"2024-05-22T14:31:07","Core":"3_0_0","SDK":"5.1.3.240430","CpuFrequency":240,"Hardware":"ESP32-S3 v0.1","CR":"379/699"}}
12:49:36.577 RSL: STATUS3 = {"StatusLOG":{"SerialLog":4,"WebLog":4,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["MYSSID",""],"TelePeriod":90,"Resolution":"558180C0","SetOption":["00008009","2805C80001000600003C5A0A192800000000","00000080","00006000","00004002","00000000"]}}
12:49:36.587 RSL: STATUS4 = {"StatusMEM":{"ProgramSize":1972,"Free":907,"Heap":198,"StackLowMark":3,"PsrMax":0,"PsrFree":0,"ProgramFlashSize":8192,"FlashSize":8192,"FlashChipId":"1740C8","FlashFrequency":80,"FlashMode":"QIO","Features":["0809","9F9AD7DF","0015A001","B7F7BFCF","05DA9BC4","E0360DC7","480840D2","20200000","D4BC482D","810A80B1","00000014"],"Drivers":"1,2,!3,4,!5,7,!8,9,10,11,12,!14,!16,!17,!20,!21,!24,26,!27,29,!34,!35,38,50,52,!59,!60,62,!63,!66,!67,!68,!73,82,!86,!87,!88,!121","Sensors":"1,2,3,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21,22,26,31,34,37,39,40,42,43,45,51,52,55,56,58,59,64,66,67,74,85,92,95,98,103,105,109,127","I2CDriver":"7,8,9,10,11,12,13,14,15,17,18,20,24,29,31,36,41,42,44,46,48,58,62,65,69,76,77,82"}}
12:49:36.594 RSL: STATUS5 = {"StatusNET":{"Hostname":"tasmota-9CE64C-1612","IPAddress":"192.168.1.198","Gateway":"192.168.1.254","Subnetmask":"255.255.255.0","DNSServer1":"192.168.1.254","DNSServer2":"0.0.0.0","Mac":"34:85:18:9C:E6:4C","IP6Global":"2a00:23c6:661f:c801:3685:18ff:fe9c:e64c","IP6Local":"fe80::3685:18ff:fe9c:e64c%st1","Ethernet":{"Hostname":"","IPAddress":"0.0.0.0","Gateway":"0.0.0.0","Subnetmask":"0.0.0.0","DNSServer1":"192.168.1.254","DNSServer2":"0.0.0.0","Mac":"00:00:00:00:00:00","IP6Global":"","IP6Local":""},"Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":19.0}}
12:49:36.600 RSL: STATUS6 = {"StatusMQT":{"MqttHost":"","MqttPort":1883,"MqttClientMask":"DVES_%06X","MqttClient":"DVES_9CE64C","MqttUser":"DVES_USER","MqttCount":0,"MAX_PACKET_SIZE":1200,"KEEPALIVE":30,"SOCKET_TIMEOUT":4}}
12:49:36.604 RSL: STATUS7 = {"StatusTIM":{"UTC":"2024-05-23T11:49:36Z","Local":"2024-05-23T12:49:36","StartDST":"2024-03-31T02:00:00","EndDST":"2024-10-27T03:00:00","Timezone":"+01:00","Sunrise":"04:58","Sunset":"20:35"}}
12:49:36.612 RSL: STATUS10 = {"StatusSNS":{"Time":"2024-05-23T12:49:36"}}
12:49:36.615 BRY: GC from 7858 to 4292 bytes, objects freed 9/48 (in 0 ms) - slots from 53/91 to 52/91
12:49:36.620 RSL: STATUS11 = {"StatusSTS":{"Time":"2024-05-23T12:49:36","Uptime":"0T00:12:55","UptimeSec":775,"Heap":197,"SleepMode":"Dynamic","Sleep":1,"LoadAvg":1002,"MqttCount":0,"Berry":{"HeapUsed":4,"Objects":48},"POWER":"OFF","Dimmer":15,"Color":"262626","HSBColor":"0,0,15","Channel":[15,15,15],"Scheme":0,"Width":1,"Fade":"OFF","Speed":1,"LedTable":"ON","Wifi":{"AP":1,"SSId":"MYSSID","BSSId":"AC:B6:87:C7:0E:B8","Channel":6,"Mode":"HT20","RSSI":100,"Signal":-48,"LinkCount":1,"Downtime":"0T00:00:04"}}}
12:49:36.628 RSL: STATUS12 = {"StatusSTK":{"Exception":1070552992,"Reason":"Unknown","EPC":"4037ca8a","EXCVADDR":"00000000","CallChain":["4037ca87","420afb15","421363bb"]}}
12:49:45.159 WIF: Checking connection...

Backtracx�x����x����������xx��������x����xx�xx����xx���x������xx�xx�x������x����xx�xx�����x���x�����x�����������xx���x����xx�xx����x�x�x�xx�xx�xx�xx�xxx�xx���x��x�xxx���x�xx��x���������x��x���xx�x���x�x�x��x���x��x��x�xx�x��xx�xx�x�x�x��x�x�xx�x�xx�x��� 00:00:00.252 CMD: Fall back to serial port, no SOF packet detected on USB port 00:00:00.253 HDW: ESP32-S3 v0.1 00:00:00.306 UFS: FlashFS mounted with 4376 kB free 00:00:00.324 CFG: Loaded from File, Count 399



### TO REPRODUCE

unzip the attached zip onto an SD card, insert into the device.

Monitor the serial of the device.

Using the menus, go to Tools/Manage File System

click on `timelapse`

click on `tl`  (contains ~1600 files)

files will begin to be listed in the browser window.  You may get as far as 45-75 files listed before the device crashes.

Note that this is a way to reproduce, but the issue affects general read/write of the SD card, and is not necessarily specifically related to listing files.  e.g. it was crashing writing files every 10s to the folder.  Backtraces indicated it was in different functions, e.g. sometimes fclose() on a file write.  But this reproduction does not involve writing...

### EXPECTED BEHAVIOUR
The device should not crash when accessing the SD card.

### SCREENSHOTS
n/a

### ADDITIONAL CONTEXT

use the contents of this zip file to easily reproduce:

[timelapse.zip](https://github.com/arendst/Tasmota/files/15416363/timelapse.zip)

**(Please, remember to close the issue when the problem has been addressed)**
arendst commented 1 month ago

FYI any exception with Reason unknown is likely caused by watchdogs enabled recently in dev branch.

To stop chasing windmills I suggest you disable the watchdog feature and continue with what you want to do.

EDIT: Add this #undef USE_ESP32_WDT to your user_config_override.h

btsimonh commented 1 month ago

you are of course correct. adding this to user_config_override.h, and rebuilding for my esp32cam, it's listed 430+ files so far (better than 45). We could probably do with having a configuration to turn it off. There are always going to be odd circumstances which can't be tested for and so be accounted for with calls to tickle the watchdog. At least then we could continue to use the 'current' version whilst a strange issue is mitigated in some other way without a rebuild.

btsimonh commented 1 month ago

it dies in a different way now.... no crash log on the serial, just hangs with no network access, and serial logs turned themselves off... but then I'm giving it hell, so I'm not surprised.

btsimonh commented 1 month ago

re-opening - the watchdog triggers in the reproduction above because of listing a very large folder, and so hanging the main loop - which we could solve with some yield(). But with further testing any SD access can cause a main loop hang. I will do some further investigation, and come up with another reproduction; maybe some Berry that will cause it. It seems fine on my S3 board, but not on a 'normal' esp32cam board. I will also re-test dev vs release.

Edit - a self-build release build stalls main loop - first time at first file download. 2nd time after 471 file downloads.

btsimonh commented 1 month ago

I think a bug in the UFSServe function. Possibly streaming direct from file to socket is not a good idea... After more testing, I'll PR, and include in the PR a fix for listing large folders with Watchdog on (the original reproduction code).

Jason2866 commented 1 month ago

The reason why the WDT is enabled is to find code parts which blocks Tasmota (way too long). Goal is to change long blocking code in "normal" working code.