Open mcspr opened 3 years ago
Just as a test, I tried it on C++ shell, which does give the expected results. What is striking is that the reported start address of the 2nd C-string is at the "one past the last element" address of the first string. I did a few tests on the C++ shell and even when I swap the empty and the 1 char string, they keep the order in which they are defined in the code. So I wonder why in your run the latter string is placed before. (based on your comment and end address)
Is this run from a completely clean build?
Edit: I can also get the order of the strings to be swapped in memory by toggling the optimization flag (-O1 or higher) on C++shell. So apparently that is expected behavior. This means we're back at what you expressed as a hypothesis a while ago that we might be looking at some old stored address when linking.
Could not reproduce with 10.2 & 11.1 x86_64. Try on current xtensa-toolchain gcc locally and run / disassemble test
in gdb?
And it seems only string literals are affected. Can't really trigger the issue when using either const char buf[] = "...";
or static const char buf[] = "...";
are passed to the function.
Small update based on the gitter discussion and a compressed example. Using and not using -fno-constants-merge
as a build flag has different results for the test case.
For the function body such as this
35 void test() {
36 addresses(EMPTY_STRING);
0x40201048 <+0>: l32r a3, 0x4020103c
0x4020104b <+3>: l32r a2, 0x40201040
0x4020104e <+6>: addi a1, a1, -16
0x40201051 <+9>: s32i a0, a1, 12
0x40201054 <+12>: call0 0x40201020 <addresses(char const*, char const*)>
37 addresses(COMMA_STRING);
0x40201057 <+15>: l32r a3, 0x40201044
0x4020105a <+18>: l32r a2, 0x4020103c
0x4020105d <+21>: call0 0x40201020 <addresses(char const*, char const*)>
0x40201060 <+24>: l32i a0, a1, 12
0x40201063 <+27>: addi a1, a1, 16
0x40201066 <+30>: ret.n
The l32r address loads order & values do not change, string addresses inside of them do
0x4020103c: 0x3ffe8633
0x40201040: 0x3ffe8632
0x40201044: 0x3ffe8635
(and the 3ffe86* seems to be located in core_esp8266_phy.cpp.o for some reason)
This is by default, without the switch
0x4020103c: 0x3ffe87e1
0x40201040: 0x3ffe87e0
0x40201044: 0x3ffe87dd
(strings are other way around)
A step before linking, assembly has slight differences. Without the flag, strings have .rodata.$origin$.str#.#
section (name originates here), while with the flag they are assigned a generic .rodata
diff --git a/a-defaults.s b/a-fno-constants-merge.s
index 8ada173..c8fb2c2 100644
--- a/a-defaults.s
+++ b/a-fno-constants-merge.s
@@ -1,6 +1,6 @@
.file "a.cpp"
.text
- .section .rodata._Z9addressesPKcS0_.str1.1,"aMS",@progbits,1
+ .section .rodata
.LC0:
.string "%p:%p -> (%u)\n"
.section .text._Z9addressesPKcS0_,"ax",@progbits
@@ -21,7 +21,7 @@ _Z9addressesPKcS0_:
addi sp, sp, 16
ret.n
.size _Z9addressesPKcS0_, .-_Z9addressesPKcS0_
- .section .rodata._Z4testv.str1.1,"aMS",@progbits,1
+ .section .rodata
.LC2:
.string ""
.LC5:
@@ -48,7 +48,7 @@ _Z4testv:
addi sp, sp, 16
ret.n
.size _Z4testv, .-_Z4testv
- .section .rodata.setup.str1.1,"aMS",@progbits,1
+ .section .rodata
.LC9:
.string "\n\n\n"
.section .text.setup,"ax",@progbits
Another test, but now with some custom section that does not start with .rodata
, but is set through code to be 'mergeable' (notice that both strings must be in the same section or match into some section by the .ld script. here, nothing matches '.mystring' so it's just dumped into the bss b/c of the *(COMMON)
matcher there). Both ld and gcc are with the default build flags, both relaxation and gcc merging are enabled.
34 #define MERGE_SECTION(NAME) __attribute__((section( "\"" NAME "\", \"aSM\", @progbits, 1 #")))
35 void test() {
36 static const char EMPTY_STRING[] MERGE_SECTION(".mystrings") = "";
37 addresses(EMPTY_STRING);
0x40201048 <+0>: l32r a3, 0x4020103c
0x4020104b <+3>: l32r a2, 0x40201040
0x4020104e <+6>: addi a1, a1, -16
0x40201051 <+9>: s32i a0, a1, 12
0x40201054 <+12>: call0 0x40201020 <addresses(char const*, char const*)>
38 static const char COMMA_STRING[] MERGE_SECTION(".mystrings") = ",";
39 addresses(COMMA_STRING);
0x40201057 <+15>: l32r a3, 0x40201040
0x4020105a <+18>: l32r a2, 0x40201044
0x4020105d <+21>: call0 0x40201020 <addresses(char const*, char const*)>
0x40201060 <+24>: l32i a0, a1, 12
0x40201063 <+27>: addi a1, a1, 16
0x40201066 <+30>: ret.n
> xtensa-lx106-elf-objdump -s -j .mystrings .pio/build/d1_mini/firmware.elf
.pio/build/d1_mini/firmware.elf: file format elf32-xtensa-le
Contents of section .mystrings:
3ffe8998 2c00 ,.
And it does match the expected layout.
0x4020103c: 0x3ffe899a << end for both "," and "" (merged)
0x40201040: 0x3ffe8999 << '\0', start for ""
0x40201044: 0x3ffe8998 << ',', start for ","
edit: probably was to hasty to blame .rodata
, only pattern so far is placement in the section and in what order they are processed by the linker? for example, this also breaks when COMMA_STRING
and EMPTY_STRING
are swapped in the code. Here, ',' is used both as start of the "," and as the end for empty string. Pointers to the start of the "" and end of the "," are saved
static const char COMMA_STRING[] MERGE_SECTION(".rodata.bleh") = ",";
static const char EMPTY_STRING[] MERGE_SECTION(".rodata.bleh") = "";
> bash dump.sh
0x4020103c: 0x3ffe87df
0x40201040: 0x3ffe87de
0x40201044: 0x3ffe87e1
> xtensa-lx106-elf-objdump -s -j .rodata .pio/build/d1_mini/firmware.elf | grep -E '3ffe87[d,e]'
3ffe87d0 70202d3e 20282575 290a000a 0a0a002c p -> (%u)......,
3ffe87e0 00281a14 00000000 00000000 00000000 .(..............
> xtensa-lx106-elf-objdump -s -j .rodata.bleh .pio/build/d1_mini/src/a.cpp.o
.pio/build/d1_mini/src/a.cpp.o: file format elf32-xtensa-le
Contents of section .rodata.bleh:
0000 002c00 .,.
When EMPTY_STRING
is first, COMMA_STRING
second in the code
> bash dump.sh
0x4020103c: 0x3ffe87df
0x40201040: 0x3ffe87de
0x40201044: 0x3ffe87df
> xtensa-lx106-elf-objdump -s -j .rodata .pio/build/d1_mini/firmware.elf | grep -E '3ffe87[d,e]'
3ffe87d0 70202d3e 20282575 290a000a 0a0a002c p -> (%u)......,
3ffe87e0 00281a14 00000000 00000000 00000000 .(..............
> xtensa-lx106-elf-objdump -s -j .rodata.bleh .pio/build/d1_mini/src/a.cpp.o
.pio/build/d1_mini/src/a.cpp.o: file format elf32-xtensa-le
Contents of section .rodata.bleh:
0000 2c0000 ,..
(Unsurprisingly?), can also reproduce on the current arduino-esp32 2.0.1, just a matter of making strings appear in a certain order. But, can't reproduce just with the xtensa toolchain or godbolt to minimize this even more :/
This is arduino-cli.exe compile --clean -b esp32:esp32:esp32
aka basic esp32 dev board
Dump of assembler code for function test():
testme.ino:
11 void test() {
0x400d0f40 <+0>: entry a1, 32
12 addresses("");
0x400d0f43 <+3>: l32r a11, 0x400d0024 <_stext+4>
0x400d0f46 <+6>: l32r a10, 0x400d0028 <_stext+8>
0x400d0f49 <+9>: call8 0x400d0f2c <addresses(char const*, char const*)>
13 addresses(",");
0x400d0f4c <+12>: l32r a11, 0x400d002c <_stext+12>
0x400d0f4f <+15>: l32r a10, 0x400d0024 <_stext+4>
0x400d0f52 <+18>: call8 0x400d0f2c <addresses(char const*, char const*)>
0x400d0f55 <+21>: retw.n
End of assembler dump.
0x400d0024 <_stext+4>: 0x3f40264d
0x400d0028 <_stext+8>: 0x3f40264c
0x400d002c <_stext+12>: 0x3f400131
> xtensa-esp32-elf-objdump -s -j .flash.rodata .\testme.ino.elf | select-string 3f40264
3f402640 69746174 696f6e20 4572726f 723a2043 itation Error: C
> xtensa-esp32-elf-objdump -s -j .flash.rodata .\testme.ino.elf | select-string '3f4001[2,3]'
3f400120 25703a25 70202d3e 20282575 290a002c %p:%p -> (%u)..,
3f400130 002a0000 00000000 00000000 20b20e40 .*.......... ..@
Basic Infos
Platform
Settings in IDE
Defaults
Problem Description
Using
std::begin
andstd::end
or the equivalent&buf[0]
and&buf[Size]
, sometimes the resulting string pointer happens to be at a different location than expected. Usingaddresses(std::begin(...), std::end(...));
outside of the template produces the same results. Referring to&buf[Size-1]
aka'\0'
does not result in the incorrect pointers though.Originally noted in the https://github.com/earlephilhower/newlib-xtensa/issues/19#issuecomment-921884356, but it might be something different than string suffix merging / anything related to the toolchain or libc? (and not to continue an already long issue thread. plus, I hope I have not broken toolchain installation somehow)
MCVE Sketch
Debug Messages
Which is not the expected result. Inspecting the binary, "," is actually at the 0x3ffe87db as one would expect from the
end
(one past the last element of array of 2 elems)