LIFsCode / ELOC-3.0

Firmware for ELOC 3.0 Hardware
MIT License
3 stars 3 forks source link

Merge ai branch into master #42

Closed OOHehir closed 10 months ago

OOHehir commented 10 months ago

I've rebased the ai branch against master & it looks in reasonable shape. It compiles successfully. I've made a few minor modifications/ renames were conflicts existed, hopefully they're reasonable.

However there is currently (at least) one issue.

It seems that when the wav file & log file file are simultaneously open or simultaneously been written to a crash occurs, along the lines of 'a corrupted task':

***ERROR*** A stack overflow in task Wav file writer has been detected.

Backtrace: 0x400848a5:0x3ffdf420 0x40095765:0x3ffdf440 0x400997aa:0x3ffdf460 0x40097cb9:0x3ffdf4d0 0x400959cc:0x3ffdf4f0 0x4009597e:0xa5a5a5a5 |<-CORRUPTED

ELF file SHA256: 2a53431754edece7

I wonder if it's multiple threads accessing a shared resource(the SD card) that needs mutexes to resolve? Just a guess at this point without further investigation...

Disabling the logging in main.cpp currently resolves the crashes.

I'll hopefully get some time over the next few days to look at this but wanted to start things moving on the merge..

LIFsCode commented 10 months ago

I've rebased the ai branch against master & it looks in reasonable shape. It compiles successfully. I've made a few minor modifications/ renames were conflicts existed, hopefully they're reasonable.

However there is currently (at least) one issue.

It seems that when the wav file & log file file are simultaneously open or simultaneously been written to a crash occurs, along the lines of 'a corrupted task':

***ERROR*** A stack overflow in task Wav file writer has been detected.

Backtrace: 0x400848a5:0x3ffdf420 0x40095765:0x3ffdf440 0x400997aa:0x3ffdf460 0x40097cb9:0x3ffdf4d0 0x400959cc:0x3ffdf4f0 0x4009597e:0xa5a5a5a5 |<-CORRUPTED

ELF file SHA256: 2a53431754edece7

I wonder if it's multiple threads accessing a shared resource(the SD card) that needs mutexes to resolve? Just a guess at this point without further investigation...

Disabling the logging in main.cpp currently resolves the crashes.

I'll hopefully get some time over the next few days to look at this but wanted to start things moving on the merge..

I hope this should not be due to shared log file on the SD card. This should be fully protected by the mutex scope guard.

However it could be a read stack problem. Redirecting the ESP log requires use of vprintf which could use quiet a lot of stack memory, depending on . It could also be a problem with the number or size of the logfiles you could change the config via BT:

setLogPersitent#cfg={"logToSdCard":"true","filename":"/sdcard/log/eloc.log","maxFiles":6,"maxFileSize":1024}

or change it change it persistent via the setConfig command.

I'm using Bluetooth Serial Terminal for Windows or this for Android both work pretty good with the ELOC.

OOHehir commented 10 months ago

I've rebased the ai branch against master & it looks in reasonable shape. It compiles successfully. I've made a few minor modifications/ renames were conflicts existed, hopefully they're reasonable. However there is currently (at least) one issue. It seems that when the wav file & log file file are simultaneously open or simultaneously been written to a crash occurs, along the lines of 'a corrupted task':

***ERROR*** A stack overflow in task Wav file writer has been detected.

Backtrace: 0x400848a5:0x3ffdf420 0x40095765:0x3ffdf440 0x400997aa:0x3ffdf460 0x40097cb9:0x3ffdf4d0 0x400959cc:0x3ffdf4f0 0x4009597e:0xa5a5a5a5 |<-CORRUPTED

ELF file SHA256: 2a53431754edece7

I wonder if it's multiple threads accessing a shared resource(the SD card) that needs mutexes to resolve? Just a guess at this point without further investigation... Disabling the logging in main.cpp currently resolves the crashes. I'll hopefully get some time over the next few days to look at this but wanted to start things moving on the merge..

I hope this should not be due to shared log file on the SD card. This should be fully protected by the mutex scope guard.

However it could be a read stack problem. Redirecting the ESP log requires use of vprintf which could use quiet a lot of stack memory, depending on . It could also be a problem with the number or size of the logfiles you could change the config via BT:

setLogPersitent#cfg={"logToSdCard":"true","filename":"/sdcard/log/eloc.log","maxFiles":6,"maxFileSize":1024}

or change it change it persistent via the setConfig command.

I'm using Bluetooth Serial Terminal for Windows or this for Android both work pretty good with the ELOC.

Does the mutex protect against multiple process writing to the one log file or does it serve some other purpose? I haven't had a chance to look through the code in detail yet.

I'll try reducing the logfiles number & size as a first step, the EI code uses a lot of stack memory so perhaps could be the issue. If that doesn't work I'll try creating a semaphore in main.cpp to pass to any tasks accessing the SD card.

I'm on a linux machine but I'm sure there an application to monitor the BT serial output.

LIFsCode commented 10 months ago

The mutex in the RotateFile protects against simultaneous writes to the same file by multiple processes. Still the log of all processes are stored in the same file. I've tested this with a separate project with excessive log messages from multiple tasks which run asynchronously and had no problems with it. But there could still be some hidden bugs in it, so best if you have a look at it

I don't think multiple process accessing the SD card with different files must be protected, as the Esp IDF FS is threadsafe protected.

Maybe you can increase the stack size of the wave writer task.

OOHehir commented 10 months ago

@LIFsCode

Maybe you can increase the stack size of the wave writer task.

That's solved the issue! The ai branch seems to be running now without any crashes for the last 10 min or so..

I had assumed that the thread stack was only for local variables created in the subsequent thread/ task but from this result it would suggest that this understanding is incorrect.

@EDsteve It seems that this PR can now be moved from a draft to an actual PR unless anyone has further changes?

EDsteve commented 10 months ago

@OOHehir Nothing to add for now. Still on the road. But will be home tomorrow and then do testing testing testing :)

OOHehir commented 10 months ago

@LIFsCode Not sure how you'd like to proceed? Do you want to review this or would you like me to resolve the conflicts & complete the merge?

LIFsCode commented 10 months ago

@OOHehir Feel free to resolve the conflicts and complete the merge. Thanks a lot :)

OOHehir commented 10 months ago

@LIFsCode No problem. Branches merged. I had to make some subsequent changes to fix crashes on the 'non-EI' or 'esp32dev' build. Amongst others this involved modifying 'sdkconfig.esp32dev' to enable SPIRAM / PSRAM & changing the flash size to 16MB to match the chip model on the WROVER.

@EDsteve Just a quick test but both the EI & non-EI versions of the code build & run (at least in the short term) successfully.