arduino / ArduinoCore-renesas

MIT License
110 stars 74 forks source link

Memory corruption in simple sketch #76

Closed tttapa closed 1 year ago

tttapa commented 1 year ago

Reported at https://forum.arduino.cc/t/why-does-the-compiler-add-4x-00-to-a-string-array-element/1152890

UNO R4 Wifi, version 1.0.2 of this core.

String patternNames[] = {
    "@@@@@",
    "Light-weight spaceship",
    "R-Pentomino",
    "Diehard",
    ""
  };

void setup() {
  Serial.begin(9600);
  delay(1000);

  Serial.println("Looking at the first characters of the first element i the String-Arrazy");
  Serial.println(String(patternNames[0].charAt(0), HEX));
  Serial.println(String(patternNames[0].charAt(1), HEX));
  Serial.println(String(patternNames[0].charAt(2), HEX));
  Serial.println(String(patternNames[0].charAt(3), HEX));
  Serial.println(String(patternNames[0].charAt(4), HEX));

  Serial.end();
}
void loop() {}

Output:

Looking at the first five characters of the first element in the String-Array
0
0
0
0
40

Should be 5× 40.
Removing the last empty string from the array causes the right characters to be printed.

I've done some reverse engineering of the binary already (https://forum.arduino.cc/t/why-does-the-compiler-add-4x-00-to-a-string-array-element/1152890/44), but could not find anything, so I suspect memory corruption by some Core function (probably a 32-bit write to the beginning of the heap, presumably somewhere in a global constructor, initVariant, analogReference or startAgt).
Ideally, someone with access to an UNO R4 and a hardware debugger would debug it with the necessary watchpoints to see what causes the memory to be overwritten.

aentinger commented 1 year ago

I suspect this is rather a problem on the usage of the various ArduinoCore-API functions (String, etc.), than a heap limitation. How about if you remove one char from a previous string? How about removing multiple?

You can just wip up some test code and execute the same test on your PC, see here. All the fancy debuggers you need run on your PC 😉 .

tttapa commented 1 year ago

Hi, thanks for the quick response.

As discussed in the forum thread, the user's code is perfectly fine.
The problem exists elsewhere.

For good measure, I ran the following test against the ArduinoCore-API on my PC, and it passes as expected (with the sanitizers enabled and under Valgrind):

TEST_CASE ("Testing https://github.com/arduino/ArduinoCore-renesas/issues/76", "[String-issue-76]")
{
  arduino::String patternNames[] = {
    "@@@@@",
    "Light-weight spaceship",
    "R-Pentomino",
    "Diehard",
    "",
  };

  REQUIRE( arduino::String(patternNames[0].charAt(0), HEX) == "40" );
  REQUIRE( arduino::String(patternNames[0].charAt(1), HEX) == "40" );
  REQUIRE( arduino::String(patternNames[0].charAt(2), HEX) == "40" );
  REQUIRE( arduino::String(patternNames[0].charAt(3), HEX) == "40" );
  REQUIRE( arduino::String(patternNames[0].charAt(4), HEX) == "40" );
}
mjs513 commented 1 year ago

Just to add a bit more info - may or may not be pertinent but with the existing core opt at -0s a memory dump of starting at the heap base shows:

HEAP START: 20000B68
HEAP LIMIT: 20007B00
SBRK: 20000BF4
0: 20000b70
1: 20000b80
2: 20000ba0
3: 20000bb0
4: 20000bc0
SBRK: 20000C08
Looking at the first characters of the first element i the String-Arrazy
0
0
0
0
40
Memory Dump (starts at _Heap_Begin
20000B68 - 10 00 00 00 FC FF FF FF  00 00 00 00 40 00 00 00  : ........ ....@...
20000B78 - 20 00 00 00 FC FF FF FF  4C 69 67 68 74 2D 77 65  :  ....... Light-we
20000B88 - 69 67 68 74 20 73 70 61  63 65 73 68 69 70 00 00  : ight spa ceship..
20000B98 - 14 00 00 00 FC FF FF FF  52 2D 50 65 6E 74 6F 6D  : ........ R-Pentom
20000BA8 - 69 6E 6F 00 10 00 00 00  44 69 65 68 61 72 64 00  : ino..... Diehard.
20000BB8 - 00 00 00 00 0C 00 00 00  00 00 00 00 00 00 00 00  : ........ ........
20000BC8 - 2C 00 00 00 FC FF FF FF  00 00 00 01 00 00 00 00  : ,....... ........
20000BD8 - 54 47 41 00 14 09 00 20  00 40 08 40 C0 5D 00 00  : TGA....  .@.@.]..

You can kind of see from the memory dump that something is off with the addresses for the start of each element. Also the first element is only showing 1 @ sign vs 4, etc.

From some earlier experimentation it would work with -O2 opt level so for comparison on what it should be:

HEAP START: 20000B70
HEAP LIMIT: 20007B00
SBRK: 20000BFC
0: 20000b78
1: 20000b88
2: 20000ba8
3: 20000bb8
4: 20000bc8
SBRK: 20000C10
Looking at the first characters of the first element i the String-Arrazy
40
40
40
40
40
Memory Dump (starts at _Heap_Begin
20000B70 - 10 00 00 00 00 00 00 00  40 40 40 40 40 00 00 00  : ........ @@@@@...
20000B80 - 20 00 00 00 FC FF FF FF  4C 69 67 68 74 2D 77 65  :  ....... Light-we
20000B90 - 69 67 68 74 20 73 70 61  63 65 73 68 69 70 00 00  : ight spa ceship..
20000BA0 - 14 00 00 00 FC FF FF FF  52 2D 50 65 6E 74 6F 6D  : ........ R-Pentom
20000BB0 - 69 6E 6F 00 10 00 00 00  44 69 65 68 61 72 64 00  : ino..... Diehard.
20000BC0 - 00 00 00 00 0C 00 00 00  00 00 00 00 00 00 00 00  : ........ ........
20000BD0 - 2C 00 00 00 FC FF FF FF  00 00 00 01 00 00 00 00  : ,....... ........
20000BE0 - 54 47 41 00 18 09 00 20  00 40 08 40 C0 5D 00 00  : TGA....  .@.@.]..

And in this case its as expected. So something is off someplace. And as @tttapa showed it does work from the core-api

KurtE commented 1 year ago

Personally, I like easily reproducible bugs like this. So, I would resist changing the sketch in ways that cause the issue to not reproduce. As there is likely a real underlying problem.

As I asked on the forum thread earlier, has it been tried on other Arduino Boards. @per1234 verified that it works on Mega 2560. Does it fail the same way on WIFI and MINIMA?

Wifi:

Looking at the first characters of the first element i the String-Arrazy
0
0
0
0
40

Running on a Minima, it appears to work correctly:

Looking at the first characters of the first element i the String-Arrazy
40
40
40
40
40

So what is the difference?

Serial on Minima uses the processors built-in USB support, WIFI - talks through ESP32. Could something here be screwing it up?

Memory layout and timing again mostly do to Serial stuff, I would think, but there are different pins, different startup stuff...

ESP32 SWD? As was mentioned in the forum thread, about trying to debug. Pointed to the thread: https://forum.arduino.cc/t/arduino-r4-wifi-debug/1148053 And mentioned that the (esp32-s3 pins are wired to the SWD pins of the main MCU). So does the ESP32 ever update the memory as part of startup? Also is there a setup for hardware debugging?

Other things we might try, is to look through all of the compiler warning generated on each build, to see if there are any hints. My first pass through them, I did not see anything obvious. But ideally, there should not be any compiler warnings on the core stuff...

Sorry, I know not much help.

Edit: I edited the sketch to copy out the data, could have used memcpy but avoided...

String patternNames[] = {
    "@@@@@",
    "Light-weight spaceship",
    "R-Pentomino",
    "Diehard",
    ""
  };

void setup() {
  const char *psz = patternNames[0].c_str();
  char before_serial[5];
  for (uint8_t i=0; i < 5; i++) before_serial[i] = psz[i];
  Serial.begin(9600);
  delay(1000);

  Serial.println("Looking at the first characters of the first element i the String-Arrazy");
  Serial.println(String(patternNames[0].charAt(0), HEX));
  Serial.println(String(patternNames[0].charAt(1), HEX));
  Serial.println(String(patternNames[0].charAt(2), HEX));
  Serial.println(String(patternNames[0].charAt(3), HEX));
  Serial.println(String(patternNames[0].charAt(4), HEX));

  for (uint8_t i=0; i < 5; i++) {Serial.print(before_serial[i], HEX); Serial.print(" ");}
  Serial.println();

  Serial.end();
}
void loop() {}

And output:

Looking at the first characters of the first element i the String-Arrazy
0
0
0
0
40
0 0 0 0 40 

So was corrupted before the Serial start code

KurtE commented 1 year ago

Not sure if best to put data here or on forum... My last post on forum thread.

I know the data is valid in the arduino_main() code, so far before _init() is called. And I know it is invalid by the time it calls setup()

EDIT: I know it is valid after _init() is called. And it is corrupted before the startAgt() is called

EDIT2: It looks like it is corrupted in the call initVariant()

aentinger commented 1 year ago

Hi @KurtE ☕ 👋

It's best to keep all issue related information within this issue report, so no forum.

We ❤️ that you are so engaged with our product but can you try and keep your conversation more focussed? All your entries add an incredible amount of noise to the conversation, we are literally getting drowned in emails and writing a new issue entry because you tought of something else is adding just so much more noise. Sorry, but that's the way it is.

Henceforth, if you create a PR, then please develop in your fork and only create the PR when you're done with it. Once you've created a PR, do not use it as a living thing to keep developing on it and document your findings there. A PR is meant to be a "done" contribution to be evaluated by the maintainers.

If you chime in on other people's issue, please make sure that your contribution adds something new and relevant to the discussion. Or best you directly provide a PR that solves this issue.

Thank you, 🙇 .

facchinm commented 1 year ago

@tttapa @mjs513 @KurtE I triaged the bug and proposed a fix ( #86 ) , it would be great if you could check if the fix works for you and report back!

KurtE commented 1 year ago

Looks like it took care of it. Thanks

mjs513 commented 1 year ago

I will second what @KurtE said. Looks like that fixed the issue.

aentinger commented 1 year ago

Thanks for testing @KurtE and @mjs513 :coffee: :wave: . I've merged the PR of @facchinm .