Error: Core 1 panic'ed (Unhandled debug exception)

dsyleixa commented 2 years ago

Arduino IDE 1.8.9 ESP32 board 1.0.6 (edit; meanwhile updated to 2.0.1) default settings

generally my program runs fine, but sometimes, unexpectedly, I get this error - but why and what does that mean....?

Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception) Debug exception reason: Stack canary watchpoint triggered (loopTask) Core 1 register dump: PC : 0x400d3254 PS : 0x00060636 A0 : 0x800d3824 A1 : 0x3ffb0030
A2 : 0x0000000e A3 : 0x3ffcc4cc A4 : 0xfffffe77 A5 : 0x00000080
A6 : 0x00000000 A7 : 0x00000001 A8 : 0x3ffc1c24 A9 : 0x00000008
A10 : 0xfffffce1 A11 : 0x00000002 A12 : 0x00000002 A13 : 0x00000353
A14 : 0x0000000a A15 : 0x3ffb08d0 SAR : 0x00000011 EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff8

ELF file SHA256: 0000000000000000

Backtrace: 0x400d3254:0x3ffb0030 0x400d3821:0x3ffb0170 0x400d3821:0x3ffb02b0 0x400d3821:0x3ffb03f0 0x400d3821:0x3ffb0530 0x400d3821:0x3ffb0670 0x400d3821:0x3ffb07b0 0x400d3821:0x3ffb08f0 0x400d3821:0x3ffb0a30 0x400d3821:0x3ffb0b70 0x400d3821:0x3ffb0cb0 0x400d3821:0x3ffb0df0 0x400d3821:0x3ffb0f30 0x400d3821:0x3ffb1070 0x400d3821:0x3ffb11b0 0x400d3821:0x3ffb12f0 0x400d3821:0x3ffb1430 0x400d3821:0x3ffb1570 0x400d3821:0x3ffb16b0 0x400d3821:0x3ffb17f0 0x400d3821:0x3ffb1930 0x400d3821:0x3ffb1a70 0x400d3821:0x3ffb1bb0 0x400d493c:0x3ffb1cf0 0x400d683b:0x3ffb1e30 0x400d71a4:0x3ffb1ef0 0x400e352d:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

Rebooting...

the program is this one: https://github.com/dsyleixa/Arduino/tree/master/ESP32_GBox/ESP32_Box023 the error happens sometimes when running the "chess" subroutine.

just to mention: the chess program (i.e., the move generator) is the same as for my Arduino Due and for my Raspberry Pi, and there it works absolutely fine without any problem ever. So IMO the issue here on my ESP32 is probably not related to the chess algorithm itself as far as I can see.

PS, to clraify: sometimes the move generator crashes already at the 2nd or 5th recursive ply after ~50000 move computations or even less, sometimes it runs fine through the 7th recursive ply by more than 1 or 2 millions move computations and returns a valid and smart move, e.g.:

2 ply, searched:         9 
 3 ply, searched:       164 
 4 ply, searched:      1018 .
 5 ply, searched:     10045 ..........
 6 ply, searched:    138116 .........................................................................................................
 7 ply, searched:   1099725 
n b8-c6

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        29 
 3 ply, searched:       466 
 4 ply, searched:      3262 ..
 5 ply, searched:     32422 ................
 6 ply, searched:    251225 
B c1-g5  

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        63 
 3 ply, searched:       741 
 4 ply, searched:      2559 ....
 5 ply, searched:     31058 ..Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask)

dsyleixa commented 2 years ago

update: different game by different moves, but similar error after a while - still no clue what's happening here:

Guru Meditation Error: Core 1 panic'ed (Unhandled debug exception) Debug exception reason: Stack canary watchpoint triggered (loopTask) Core 1 register dump: PC : 0x400d3254 PS : 0x00060836 A0 : 0x800d3824 A1 : 0x3ffb0030
A2 : 0x00000007 A3 : 0x3ffcc4cc A4 : 0xfffffcec A5 : 0x00000080
A6 : 0x00000000 A7 : 0x00000001 A8 : 0x3ffc1c8f A9 : 0x00000008
A10 : 0x00000008 A11 : 0x00000001 A12 : 0x00000005 A13 : 0x00000020
A14 : 0x00000020 A15 : 0x3ffb08d0 SAR : 0x00000011 EXCCAUSE: 0x00000001
EXCVADDR: 0x00000000 LBEG : 0x400014fd LEND : 0x4000150d LCOUNT : 0xfffffff8

ELF file SHA256: 0000000000000000

Backtrace: 0x400d3254:0x3ffb0030 0x400d3821:0x3ffb0170 0x400d3821:0x3ffb02b0 0x400d3821:0x3ffb03f0 0x400d3821:0x3ffb0530 0x400d3821:0x3ffb0670 0x400d3821:0x3ffb07b0 0x400d3821:0x3ffb08f0 0x400d3821:0x3ffb0a30 0x400d3821:0x3ffb0b70 0x400d3821:0x3ffb0cb0 0x400d3821:0x3ffb0df0 0x400d3821:0x3ffb0f30 0x400d3821:0x3ffb1070 0x400d3821:0x3ffb11b0 0x400d3821:0x3ffb12f0 0x400d3821:0x3ffb1430 0x400d3821:0x3ffb1570 0x400d3821:0x3ffb16b0 0x400d3821:0x3ffb17f0 0x400d3821:0x3ffb1930 0x400d3821:0x3ffb1a70 0x400d3821:0x3ffb1bb0 0x400d4968:0x3ffb1cf0 0x400d6867:0x3ffb1e30 0x400d71d0:0x3ffb1ef0 0x400e3559:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

Rebooting... ets Jun 8 2016 00:22:57

SuGlider commented 2 years ago

@dsyleixa Debug exception reason: Stack canary watchpoint triggered (loopTask) This message points to stack related issues in loop() or in any other function that is called from it.

Some nice explanation about possible reasons for this error can be found at this link: https://arduino.stackexchange.com/questions/80729/esp32-stack-canary-watchpoint-triggered

Good Luck!

SuGlider commented 2 years ago

This tool will help you in debugging this issue: https://github.com/me-no-dev/EspExceptionDecoder

You can decode the backtrace message and find out where the exception was thrown.

dsyleixa commented 2 years ago

I did this, but I don't understand what to c+p into the field and how to proceed if I paste the Backtrace

0x400d3254:0x3ffb0030 0x400d3821:0x3ffb0170 0x400d3821:0x3ffb02b0 0x400d3821:0x3ffb03f0 0x400d3821:0x3ffb0530 0x400d3821:0x3ffb0670 0x400d3821:0x3ffb07b0 0x400d3821:0x3ffb08f0 0x400d3821:0x3ffb0a30 0x400d3821:0x3ffb0b70 0x400d3821:0x3ffb0cb0 0x400d3821:0x3ffb0df0 0x400d3821:0x3ffb0f30 0x400d3821:0x3ffb1070 0x400d3821:0x3ffb11b0 0x400d3821:0x3ffb12f0 0x400d3821:0x3ffb1430 0x400d3821:0x3ffb1570 0x400d3821:0x3ffb16b0 0x400d3821:0x3ffb17f0 0x400d3821:0x3ffb1930 0x400d3821:0x3ffb1a70 0x400d3821:0x3ffb1bb0 0x400d4968:0x3ffb1cf0 0x400d6867:0x3ffb1e30 0x400d71d0:0x3ffb1ef0 0x400e3559:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

then nothing happens. no button to press and no action follows

dsyleixa commented 2 years ago

PS, just to mention, I also had added lots of delay(1) in between my for() and while() loops not to block the scheduler. I also added Serial.print('.') debug outputs to indicate incidental blocking, but when the Panic Error happens then it's in a <500ms interval since the last Serial.print('.')
so it clearly does not block the scheduler watchdog.

atanisoft commented 2 years ago

@dsyleixa This has nothing to do with the task scheduler, this is entirely on using more than 8kb of stack space in setup()/loop(). If you have large arrays (such as: uint8_t something[5000]) it may crash upon entry to the method, this is the most common reason for stack canary crashes like you post.

dsyleixa commented 2 years ago

Hmmmh... let me elaborate on it...:

Apart from Chess(), the entire program runs fine on the ESP32 through setup() and loop() and also may call other subprograms such as Paint() or Pong() without any issues. The error never occurs with different sub-programs, just with the subprogram "chess". All subprograms are using the same global variables which are used by setup() and loop().

So all runs fine >>>>>>>> untill Chess() is run.

But also the first couples of chess moves are always fine though, so also Chess() does not violate the RAM size limit when starting. Also manual moves are apllied correctly, calling the auto move generator just for move legality checks, and then also the first couples of auto moves, too.

Furthermore for Chess(), the error happens not always and not reproducably, sometimes after the 7th auto move generation ply, sometimes even after the 3rd, or perhaps after the 8th or 9th, always by identical boardsettings and identical move series. That actually makes me doubt that it's a RAM size issue.

OTOH, I meanwhile tested the Chess subprogram also on my Mega2560 too (because of smaller RAM than on Due or Raspi) , and also over there it always runs fine, so IMO it probably cannot happen because of the Chess() variables on the stack (CMIIW).

when compiling, the IDE says:

The sketch uses 854266 bytes (65%) of the program memory. The maximum is 1310720 bytes.
Global variables use 48692 bytes (14%) of dynamic memory, leaving 278988 bytes for local variables. The maximum is 327680 bytes.

I am completely at a loss, tbh...

atanisoft commented 2 years ago

IDE output of memory usage is not applicable to task stack sizing. It is only applicable to global variable allocations (one that you don't create via new/malloc/etc) and for overall size of the program with respect to the partition size.

Comparing ESP32 to an AVR Mega2650 is not a good argument for "it works", they are entirely different architectures AND the Mega2650 does not use task stacks but instead allocates on heap directly which is not applicable in an RTOS environment.

Since you have not shared much in the way of code nobody will be able to point out where your program is going awry other than general ideas like @SuGlider and I've posted.

dsyleixa commented 2 years ago

I actually already shared the code above, in the TOP: https://github.com/dsyleixa/Arduino/tree/master/ESP32_GBox/ESP32_Box023

SuGlider commented 2 years ago

My general guess is about Stack Overwflow because of potential Chess recursion.

The main difference from ESP32 Arduino to other Chips Arduino is that in ESP32 everything is running under FreeRTOS, thus, as @atanisoft said, loop() and setup() are tasks with a limit of 8K Stack each. You can create a separated task to specific routines (such as Chess, with higher Stack size for that task).

For the other Chips, Arduino is built as a pure Bare Metal application and Stack can possibly reach higher limits in available RAM, depending on the way it was built and configured. So it could explain why you don't see any errors with other "Chip Arduinos".

From the link I posted there is a general explanation: https://arduino.stackexchange.com/questions/80729/esp32-stack-canary-watchpoint-triggered

recursive functions - each time a function recurses it uses stack space. If it recurses deeply enough then it will trample the stack guard and cause this exception. For instance:

int count(i) {
  i--;

  if(i > 0) {
    Serial.println(count(i));
  }

  return i;
}

void loop() {
  count(8000);
}

Each time a function recurses, its return address and its arguments and local variables are all stored on the stack. If it recurses too many times it will use more storage than is allocated to the stack.

dsyleixa commented 2 years ago

well, as already stated, if it was a RAM issue then it's supposed to happen always reproducably at the same time, but it does not! See here: https://github.com/espressif/arduino-esp32/issues/6010#issuecomment-991907364

sometimes the move generator crashes already at the 2nd or 5th recursive ply after ~50000 move computations or even less, sometimes it runs fine through the 7th recursive ply by more than 1 or 2 millions move computations and returns a valid and smart move, e.g.:

2 ply, searched:         9 
 3 ply, searched:       164 
 4 ply, searched:      1018 .
 5 ply, searched:     10045 ..........
 6 ply, searched:    138116 .........................................................................................................
 7 ply, searched:   1099725 
n b8-c6

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        29 
 3 ply, searched:       466 
 4 ply, searched:      3262 ..
 5 ply, searched:     32422 ................
 6 ply, searched:    251225 
B c1-g5  

0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:        63 
 3 ply, searched:       741 
 4 ply, searched:      2559 ....
 5 ply, searched:     31058 ..Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask)

dsyleixa commented 2 years ago

update: I meanwhile even decreased the max deepening and the HashTable size and it sometimes already crashes in the 4th deepening ply whilst earlier it had successfully calculated up to the 7th ply. So IMO it really can't be a recursive RAM capture thing actually.


  A B C D E F G H 
  --------------- 
8 r . . q k b . r 8 
7 + + + b n + + + 7 
6 . . . . + . . . 6 
5 . . . + . . . . 5 
4 . n . * . . . . 4 
3 . . N . * N . . 3 
2 * * * . B * * * 2 
1 R . B Q K . . R 1 
  --------------- 
  A B C D E F G H 
> WHITE: 
 DEBUG cstring : 
 DEBUG K: 8000  
 DEBUG L: 19 

 0 ply, searched:         1 
 1 ply, searched:         2 
 2 ply, searched:       193 
 3 ply, searched:       542 
 4 ply, searched:      5316 ..........Guru Meditation Error: Core  1 panic'ed (Unhandled debug exception)
Debug exception reason: Stack canary watchpoint triggered (loopTask) 
Core 1 register dump:
PC      : 0x400d3310  PS      : 0x00060036  A0      : 0x800d38da  A1      : 0x3ffb0010  
A2      : 0xffffffe9  A3      : 0x3ffc1d58  A4      : 0x3ffbdc08  A5      : 0x00000080  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x3ffc1bdc  A9      : 0x00000008  
A10     : 0x00000008  A11     : 0x00000001  A12     : 0x00000005  A13     : 0x00000020  
A14     : 0x00000020  A15     : 0x3ffb08b0  SAR     : 0x00000011  EXCCAUSE: 0x00000001  
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffff8  

ELF file SHA256: 0000000000000000

Backtrace: 0x400d3310:0x3ffb0010 0x400d38d7:0x3ffb0150 0x400d38d7:0x3ffb0290 0x400d38d7:0x3ffb03d0 0x400d38d7:0x3ffb0510 0x400d38d7:0x3ffb0650 0x400d38d7:0x3ffb0790 0x400d38d7:0x3ffb08d0 0x400d38d7:0x3ffb0a10 0x400d38d7:0x3ffb0b50 0x400d38d7:0x3ffb0c90 0x400d38d7:0x3ffb0dd0 0x400d38d7:0x3ffb0f10 0x400d38d7:0x3ffb1050 0x400d38d7:0x3ffb1190 0x400d38d7:0x3ffb12d0 0x400d38d7:0x3ffb1410 0x400d38d7:0x3ffb1550 0x400d38d7:0x3ffb1690 0x400d38d7:0x3ffb17d0 0x400d38d7:0x3ffb1910 0x400d38d7:0x3ffb1a50 0x400d38d7:0x3ffb1b90 0x400d4a65:0x3ffb1cd0 0x400d6993:0x3ffb1e30 0x400d72f4:0x3ffb1ef0 0x400e365d:0x3ffb1fb0 0x4008a1fe:0x3ffb1fd0

Rebooting...

Nonetheless, after pasting the Backtrace into the Exception decoder then still nothing happens at all...

you may check a downstripped standallone version here (no TFT hardware etc): https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino

BTW, is it possible to disable this eff*** "Guru"...?

atanisoft commented 2 years ago

Suffice to say, recursion within tasks is not an easy problem to solve. It's not really a task that is designed to run on an embedded RTOS platform entirely.

However, as noted previously, you can create a task with a larger stack size to run your recursion process.

BTW, is it possible to disable this eff*** "Guru"...?

This is coming from the pre-built ESP-IDF code with the default setting of CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT. You would need to rebuild ESP-IDF code with CONFIG_ESP_SYSTEM_PANIC_SILENT_REBOOT to not have it print the register dump and backtrace, though it may print other details.

dsyleixa commented 2 years ago

I do not run a recursion in a task. Neither in my GBox program nor in the downstripped demo version https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino

But that eff*** Guru error happens at either program - nonetheless, never any runtime errors e.g. on my MEGA or my DUE.

atanisoft commented 2 years ago

I do not run a recursion in a task.

Both setup() and loop() run in a single RTOS task with 8kb stack.

https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino#L9 describes using recursion as part of it's algorithm.

Recursion points:

Initial entry point: https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino#L462 (calls https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino#L107)
First point of Recursion: https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino#L161
Second point of Recursion: https://github.com/dsyleixa/Arduino/blob/master/Chess/chess0048e32/chess0048e32.ino#L224

Each level of recursion depth will use at least 175b of stack plus any additional required for making function calls. At some point in the recursion depth it will fail as you have found.

neverany errors e.g. on my MEGA or my DUE.

AVR doesn't use RTOS and doesn't have the same concept of task stack. It uses all free heap/SRAM for the recursion usage, very likely at a certain depth of recursion it will start randomly overwriting areas of SRAM or perhaps simply crash.

dsyleixa commented 2 years ago

oh, I expected both setup() and loop() are just parts of main(), just like in Arduino, for RTOS then running in the same main() task, with access to the entire RAM :

int main() {
  setup();
  while (true) {
     loop();
  }
}

anyway, recursion is mandatory, and as I do this as to big GBox: in loop() as to small demo: in setup(), how shall I enlarge the setup/loop() task sizes accordingly to maximum? in GBox I have only 1 extra small parallel thread, not of course in the demo - but also the demo crashes though.

atanisoft commented 2 years ago

https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/main.cpp#L51 is the entrypoint from ESP-IDF (which boots up prior to the app starting).

https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/main.cpp#L67 shows where loopTask is created, you could do something similar in your setup() function where it creates a task for your recursion work with a high stack size. You will need to adjust the stack size for your recursion task until you have the depth of recursion you are after.

You can also call vTaskDelete(NULL) from the end of setup() (or loop() to dispose of loopTask and reclaim the 8kb of heap (from the task stack).

dsyleixa commented 2 years ago

I would need to code the available memory size in my program, not by patching the ESP API. Any suggestions? Most important for the loop(). I will accordingly then apply that to the reworked demo. I know the size of the common variables from the compile message, so the rest should be assigned to the loop RAM.

atanisoft commented 2 years ago

You can call the same APIs used in the links above from your code without altering the arduino-esp32 code.

dsyleixa commented 2 years ago

I have no clue how to do that, I am just used to programming by the common original Arduino API methods.

updated demo code, chess() running also in setup() now:

https://github.com/dsyleixa/Arduino/tree/master/Chess/chess0049e32

dsyleixa commented 2 years ago

if I put at the end of setup(): vTaskDelete(NULL); then I get no Serial output anymore. So how to get the entire RAM for loop()?

or is it better to put all code into setup() and clear all in loop()? And how to get all entire RAM then for loop()?

igrr commented 2 years ago

@me-no-dev @SuGlider It seems like overriding the default value of stack size for the main task (without having to fall back to arduino-as-IDF-component) could be useful.

What do you think about adding a simple way for the user to adjust the main task stack size, something along these lines:

/* in arduino-esp32 main.cpp: */
__attribute__((weak)) size_t getArduinoLoopTaskStackSize(void) {
    return ARDUINO_LOOP_STACK_SIZE;
}

/* later... */
xTaskCreateUniversal(loopTask, "loopTask", getArduinoLoopTaskStackSize(), NULL, 1, &loopTaskHandle, ARDUINO_RUNNING_CORE);

/* in Arduino.h */

#define ESP_LOOP_TASK_STACK_SIZE(sz) \
    size_t getArduinoLoopTaskStackSize(void) { \
        return sz; \
    }

/* in sketch code */

#include <Arduino.h>

ESP_LOOP_TASK_STACK_SIZE(16384);

void setup() { }

void loop() { }

Edit: alternatively, as a more general solution, we could consider a user-provided "build options" header file https://github.com/esp8266/Arduino/pull/8095 https://github.com/stm32duino/Arduino_Core_STM32/pull/1442

me-no-dev commented 2 years ago

@SuGlider let's align this with @pedrominatel and have it documented as well

dsyleixa commented 2 years ago

thanks guys for your interest in this topic and for find a fix. As a 1st step I made some local variables in my recursive functions to global and further decreased the max-deepening count, so for now it admittedly plays with a poor skill but at least doesn't crash no more.

Perhaps allow me to propose a solution: I would tend to define the stack size within the threads myself at the beginning, similar to setting a thread priority, e.g. via vTaskRamSet (NULL, 4000); // sets RAM size to 4000bytes (my2ct) thanks @all for your contributions!

We now may close it or keep it open until there is a fix, as you wish.

atanisoft commented 2 years ago

Perhaps allow me to propose a solution: I would tend to define the stack size within the threads myself at the beginning, similar to setting a thread priority, e.g. via vTaskRamSet (NULL, 4000); // sets RAM size to 4000bytes (my2ct)

Unfortunately there is no such function in FreeRTOS, the only time the stack size can be set is during creation. The solution which @igrr has proposed (weak function you can override) is likely the best option as it will work with both IDF+Arduino and Arduino (standalone).

dsyleixa commented 2 years ago

reworked my basic chess code, both by the identical algorithm, 2 UI versions: a) 1 for my Raspi to control what happens: https://github.com/dsyleixa/RaspberryPi/blob/master/chess/micromax48005.c b) 1 for my esp32 (2.0.1): https://github.com/dsyleixa/Arduino/blob/master/Chess/chessesp48005/chessesp48005.ino

first observations: the number of recursive computations on my ESP32 are far larger than on the Pi (8 ply/5.1 Mio vs 7 ply/1.3 Mio), by identical settings and boundary conditions (and besides, then a different resulting generated move).

the Raspi Xterminal console output for the 1st move, after WHITE manual move d2d4, and then BLACK auto reply (just press ENTER), is:

>   BLACK:  

 DEBUG cstring : 
 DEBUG K: 8000  
 DEBUG L: 67 

 2 ply, searched:         9 
 3 ply, searched:       172 
 4 ply, searched:       966 
 5 ply, searched:      8804 .................
 6 ply, searched:    130720 ...............................................................................................................
 7 ply, searched:   1316247 
score=8000

  1.5: n g8-f6

whilst the Serial console output of the ESP32 is:

>   BLACK:  
 DEBUG cstring : 
 DEBUG K: 8000  
 DEBUG L: 67 

 2 ply, searched:         9 
 3 ply, searched:       172 
 4 ply, searched:       976 
 5 ply, searched:      9364 ..............
 6 ply, searched:    135948 ....................................
                            ....................................
                            .........................
 7 ply, searched:    975694 ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            ....................................
                            .........................
 8 ply, searched:   5173849 
score=8000

  1.5: n b8-c6

...that is really puzzling and IMO that might be a reason for massive unexpected RAM consumptions.... :?:

dsyleixa commented 2 years ago

I actually doubt that the TOP issue is only caused by too little STACK (edited). If it was, then the program wouldn't behave so extremely different from the same program running on a RaspberryPi.

atanisoft commented 2 years ago

I actually doubt that the TOP issue is only caused by too little RAM.

It's caused by stack exhaustion in the loopTask which you can now configure higher than the default 8kb. This is clearly evident from the backtrace you have provided here as:

Debug exception reason: Stack canary watchpoint triggered (loopTask)

It has nothing to do with free RAM and everything to do with stack.

Comparing to a linux host (rPi) is not a fair comparison since they don't operate in the same fashion.

dsyleixa commented 2 years ago

I have to disagree as the programs (move generator, Negamax) are totally identical now: As stated I meanwhile have reworked the code, I get no Core Panics anymore, nonetheless the ESP runs totally different from the Pi, e.g. the ESP calculates 4x as many recursions than the Pi plus 1 extra deepening ply which must not happen. a) Raspi https://github.com/dsyleixa/RaspberryPi/blob/master/chess/micromax48005.c b) esp32 (2.0.1): https://github.com/dsyleixa/Arduino/blob/master/Chess/chessesp48005/chessesp48005.ino

atanisoft commented 2 years ago

I have to disagree as the programs (move generator, Negamax) are totally identical now:

you are free to disagree but the backtrace does not agree with you.

dsyleixa commented 2 years ago

I don't have a backtrace anymore.

igrr commented 2 years ago

@dsyleixa I'd recommend checking the code again, it seems that it relies on some undefined behaviors, so its execution is not very predictable.

On Linux (compiled with -fsanitize=undefined option):

(output)

``` /tmp/chess.c:160:9: runtime error: signed integer overflow: 1688299438 + 2038795618 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:160:9 in /tmp/chess.c:161:9: runtime error: signed integer overflow: 984903368 + 1245823260 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:161:9 in 2 ply, searched: 9 /tmp/chess.c:161:35: runtime error: signed integer overflow: 1532164499 - -635096640 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:161:35 in /tmp/chess.c:160:35: runtime error: signed integer overflow: -1827237850 - 718599499 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:160:35 in 3 ply, searched: 172 /tmp/chess.c:114:9: runtime error: signed integer overflow: 1273382264 - -991284065 cannot be represented in type 'int' SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tmp/chess.c:114:9 in ```

Integer overflow is also reported on ESP32, if we add -fsanitize=undefined compiler option:

(output)

``` > WHITE: DEBUG cstring : d2d4 DEBUG K: 99 DEBUG L: 67 Undefined behavior of type sub_overflow Backtrace:0x40081dd9:0x3ffb24d00x40087d6d:0x3ffb24f0 0x40087db7:0x3ffb2510 0x40087ddd:0x3ffb2570 0x400d6aa3:0x3ffb2590 0x400d79d3:0x3ffb26f0 0x400d7cf6:0x3ffb27f0 0x400d8812:0x3ffb2820 0x40081dd9: panic_abort at /Users/ivan/e/esp-idf/components/esp_system/panic.c:402 0x40087d6d: esp_system_abort at /Users/ivan/e/esp-idf/components/esp_system/esp_system.c:121 0x40087db7: __ubsan_default_handler at /Users/ivan/e/esp-idf/components/esp_system/ubsan.c:166 0x40087ddd: __ubsan_handle_sub_overflow at /Users/ivan/e/esp-idf/components/esp_system/ubsan.c:196 0x400d6aa3: Minimax(int, int, int, int, int, int) at /Users/ivan/e/arduino-esp32/test/build_as_component/build/../main/main.cpp:159 0x400d79d3: chess() at /Users/ivan/e/arduino-esp32/test/build_as_component/build/../main/main.cpp:316 0x400d7cf6: setup() at /Users/ivan/e/arduino-esp32/test/build_as_component/build/../main/main.cpp:381 0x400d8812: loopTask(void*) at /Users/ivan/e/arduino-esp32/cores/esp32/main.cpp:38 ```

Besides, compiling this code on Linux with -Wall -Werror flags reveals a bunch of possible issues related to operator precedence, please check them as well:

(compiler output)

``` /tmp/chess.c:115:11: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] !(m<=q|X&8&&m>=l|X&S)) // or window incompatible ~~^~ /tmp/chess.c:115:11: note: place parentheses around the '&' expression to silence this warning !(m<=q|X&8&&m>=l|X&S)) // or window incompatible ^ ( ) /tmp/chess.c:115:21: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] !(m<=q|X&8&&m>=l|X&S)) // or window incompatible ~~^~ /tmp/chess.c:115:21: note: place parentheses around the '&' expression to silence this warning !(m<=q|X&8&&m>=l|X&S)) // or window incompatible ^ ( ) /tmp/chess.c:120:5: error: & has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses] z&K==I&&(N<1e6&d<98|| // root: deepen upto time ^~~~~ /tmp/chess.c:120:5: note: place parentheses around the '==' expression to silence this warning z&K==I&&(N<1e6&d<98|| // root: deepen upto time ^ ( ) /tmp/chess.c:120:5: note: place parentheses around the & expression to evaluate it first z&K==I&&(N<1e6&d<98|| // root: deepen upto time ^ ( ) /tmp/chess.c:120:10: error: '&&' within '||' [-Werror,-Wlogical-op-parentheses] z&K==I&&(N<1e6&d<98|| // root: deepen upto time ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/chess.c:120:10: note: place parentheses around the '&&' expression to silence this warning z&K==I&&(N<1e6&d<98|| // root: deepen upto time ^ ( /tmp/chess.c:125:14: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses] m=-P35?d>2?-I:e:-P; // Prune or stand-pat ~~~~~~~~~^ /tmp/chess.c:125:14: note: place parentheses around the '|' expression to silence this warning m=-P35?d>2?-I:e:-P; // Prune or stand-pat ^ ( ) /tmp/chess.c:125:14: note: place parentheses around the '?:' expression to evaluate it first m=-P35?d>2?-I:e:-P; // Prune or stand-pat ^ ( ) /tmp/chess.c:131:20: error: operator '?:' has lower precedence than '&'; '&' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses] while(r=p>2&r<0?-r:-o[++j]) // loop over directions o[] ~~~~~~~^ /tmp/chess.c:131:20: note: place parentheses around the '&' expression to silence this warning while(r=p>2&r<0?-r:-o[++j]) // loop over directions o[] ^ ( ) /tmp/chess.c:131:20: note: place parentheses around the '?:' expression to evaluate it first while(r=p>2&r<0?-r:-o[++j]) // loop over directions o[] ^ ( ) /tmp/chess.c:131:12: error: using the result of an assignment as a condition without parentheses [-Werror,-Wparentheses] while(r=p>2&r<0?-r:-o[++j]) // loop over directions o[] ~^~~~~~~~~~~~~~~~~~~ /tmp/chess.c:131:12: note: place parentheses around the assignment to silence this warning while(r=p>2&r<0?-r:-o[++j]) // loop over directions o[] ^ ( ) /tmp/chess.c:131:12: note: use '==' to turn this assignment into an equality comparison while(r=p>2&r<0?-r:-o[++j]) // loop over directions o[] ^ == /tmp/chess.c:140:31: error: & has lower precedence than <; < will be evaluated first [-Werror,-Wparentheses] t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ~~~^ /tmp/chess.c:140:31: note: place parentheses around the '<' expression to silence this warning t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ^ ( ) /tmp/chess.c:140:31: note: place parentheses around the & expression to evaluate it first t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ^ ( ) /tmp/chess.c:140:22: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ~^~~~~~ /tmp/chess.c:140:22: note: place parentheses around the '&' expression to silence this warning t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ^ ( ) /tmp/chess.c:140:31: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ~~~~^~~~~~~~~~~~ /tmp/chess.c:140:31: note: place parentheses around the '&' expression to silence this warning t=board[H];if(t&turn|p<3&!(y-x&7)-!t)break; // capt. own, bad pawn mode ^ ( ) /tmp/chess.c:150:14: error: | has lower precedence than >; > will be evaluated first [-Werror,-Wparentheses] v-=p-4|R>29?0:20; // penalize mid-game K move ^~~~~ /tmp/chess.c:150:14: note: place parentheses around the '>' expression to silence this warning v-=p-4|R>29?0:20; // penalize mid-game K move ^ ( ) /tmp/chess.c:150:14: note: place parentheses around the | expression to evaluate it first v-=p-4|R>29?0:20; // penalize mid-game K move ^ ( ) /tmp/chess.c:150:19: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses] v-=p-4|R>29?0:20; // penalize mid-game K move ~~~~~~~~^ /tmp/chess.c:150:19: note: place parentheses around the '|' expression to silence this warning v-=p-4|R>29?0:20; // penalize mid-game K move ^ ( ) /tmp/chess.c:150:19: note: place parentheses around the '?:' expression to evaluate it first v-=p-4|R>29?0:20; // penalize mid-game K move ^ ( ) /tmp/chess.c:165:18: error: operator '?:' has lower precedence than '|'; '|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses] s=C>2|v>V?-Minimax(-l,-V,-v, // recursive eval. of reply ~~~~~~~^ /tmp/chess.c:165:18: note: place parentheses around the '|' expression to silence this warning s=C>2|v>V?-Minimax(-l,-V,-v, // recursive eval. of reply ^ ( ) /tmp/chess.c:165:18: note: place parentheses around the '?:' expression to evaluate it first s=C>2|v>V?-Minimax(-l,-V,-v, // recursive eval. of reply ^ ( /tmp/chess.c:177:21: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] m=v,X=x,Y=y|S&F; // mark double move with S ~~^~ /tmp/chess.c:177:21: note: place parentheses around the '&' expression to silence this warning m=v,X=x,Y=y|S&F; // mark double move with S ^ ( ) /tmp/chess.c:179:17: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] if(x+r-y|u&32| // not 1st step,moved before ~~^~~ /tmp/chess.c:179:17: note: place parentheses around the '&' expression to silence this warning if(x+r-y|u&32| // not 1st step,moved before ^ ( ) /tmp/chess.c:181:26: error: '&' within '^' [-Werror,-Wbitwise-op-parentheses] board[G=x+3^r>>1&7]-turn-6 // no virgin R in corner G, ~~~~~^~ /tmp/chess.c:181:26: note: place parentheses around the '&' expression to silence this warning board[G=x+3^r>>1&7]-turn-6 // no virgin R in corner G, ^ ( ) /tmp/chess.c:180:13: error: & has lower precedence than >; > will be evaluated first [-Werror,-Wparentheses] p>2&(p-4|j-7|| // no P & no lateral K move, ~~~^ /tmp/chess.c:180:13: note: place parentheses around the '>' expression to silence this warning p>2&(p-4|j-7|| // no P & no lateral K move, ^ ( ) /tmp/chess.c:180:13: note: place parentheses around the & expression to evaluate it first p>2&(p-4|j-7|| // no P & no lateral K move, ^ ( /tmp/chess.c:180:13: error: '&' within '|' [-Werror,-Wbitwise-op-parentheses] p>2&(p-4|j-7|| // no P & no lateral K move, ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/chess.c:180:13: note: place parentheses around the '&' expression to silence this warning p>2&(p-4|j-7|| // no P & no lateral K move, ^ ( /tmp/chess.c:191:8: error: | has lower precedence than ==; == will be evaluated first [-Werror,-Wparentheses] m=m+I|P==I?m:0; // best loses K: (stale)mate ^~~~~ /tmp/chess.c:191:8: note: place parentheses around the '==' expression to silence this warning m=m+I|P==I?m:0; // best loses K: (stale)mate ^ ( ) /tmp/chess.c:191:8: note: place parentheses around the | expression to evaluate it first m=m+I|P==I?m:0; // best loses K: (stale)mate ^ ( ) fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. ```

In general, if you see that a certain non-platform-specific piece of code works differently on Linux and on a microcontroller, first try to make sure it compiles and works correctly on Linux with -Wall -Werror -fsanitize=address -fsanitize=undefined compiler flags. Then if the difference is still present, apply "divide and conquer" approach — bisect the application to narrow down the place where the behavior differs between the two platforms.

If you narrow the issue down to a small fragment of code (MCVE) which still works differently on Linux and ESP while passing compiler and sanitizer checks, please post that fragment of code here, we will try to help you figure out the issue.

dsyleixa commented 2 years ago

On Raspi and ESP32 and original Arduino it's always compiled by gcc, and operator precedence for C/C++ hasn't changed since C99 or even before. As to the Linux warnings (-Wall): this issue has been asked by me and discussed in the Raspi forum - but they said e.g., "The suggestions about using brackets are just that, suggestions. Gcc is just saying that adding brackets will make it easier for anybody reading the code to see what the expressions are and less likely for the programmer (or others trying to modify the code at a later date) to make a mistake. " (ref.: https://forums.raspberrypi.com/viewtopic.php?t=325912&p=1951822&sid=b280b91b565b3f9373cfe7556bbf3875#p1950861) But I agree that there must happen some undefined behaviour, because even the 1st code from the TOP always worked fine on AVR and ARM Cortex whilst it crashed on ESP32. And BTW, the same code also runs even on an UNO, ported by another author, and it also runs like a charm: https://create.arduino.cc/projecthub/rom3/arduino-uno-micromax-chess-030d7c

dsyleixa commented 2 years ago

as to signed int overflow: I don't see any signed ints in my code, just int and int32_t (CMIIW)

dsyleixa commented 2 years ago

as the error happens in a code of multiple recursions (which always are computed in identical follow-up series though) and by admittedly multiple recursive stack allocations (correct idiom?) this code cannot be shrinked down unfortunately. Nonetheless, it works on Raspi and Arduino AVR and Arduino ARM Cortex, but it works errorneous, fails, or even crashes on ESP32.

igrr commented 2 years ago

FWIW, your Raspberry Pi version of the code produces same result for me on the ESP32 as it does on Linux. I only had to replace the platform-dependent rand() call with a simple LCG (chess_rand). See the updated code and the output I get here. To make testing simpler I've hardcoded the 3 commands into the app — "d2d4\n", "\n", "Q" — see moves array. The code should run with IDF and Arduino on the ESP32, as well as on Linux.

dsyleixa commented 2 years ago

that is amazing, thank you very much for your contributions! Now these results have now dispelled all of my concerns finally. I would never have considered that random initializing a hashtable for known positions could lead to these different results. Again, thank you a lot!

espressif / arduino-esp32

Error: Core 1 panic'ed (Unhandled debug exception) #6010