Open mcspr opened 3 years ago
Hmmm...I'm not sure I get the nature of the problem, sorry.
The string coalescing will definitely generate non-32b aligned PSTRs, but all the PSTR code does support non-32b starting strings. I don't think this would affect const char []
s in the code because those would be copied to RAM anyway as part of .rodata
anyway (and PSTRs aren't in rodata section so can't be merged w/non PSTRs).
Yeah, sorry for too early of an issue as this is very likely not the case. Experimenting with suffix-merging some strings, even having a standalone one that would match the suffix, has everything working correctly and alignment also works itself out when strings are on a 4 byte boundary. And I think I missed the part that these are C strings with a '\0' as part of the blob.
While this is not technically a newlib problem... revisiting the broken .elf once more, it has some weird structure to the generated function right where the PSTR(...) happened to be. The missing piece does some arithmetic at the beginning which causes an exception where the address became 0x0f4e0000. The correctly working one simply loads 2 pointers to the lambda and the __pstr__, and also what I see with the current gcc10.3 toolchain.
void _mqttInitCommands() {
terminalRegisterCommand(F("MQTT.RESET"), [](const terminal::CommandContext&) { ... });
terminalRegisterCommand(F("MQTT.INFO"), [](const terminal::CommandContext&) { ... });
(gdb) disassemble 0x402083a4
Dump of assembler code for function _mqttInitCommands():
0x402083a4 <+0>: l32r a3, 0x40208398
0x402083a7 <+3>: movi a2, 0x7a7
0x402083aa <+6>: addi a1, a1, -16
0x402083ad <+9>: slli a2, a2, 17
0x402083b0 <+12>: s32i a0, a1, 12
0x402083b3 <+15>: call0 0x40213488 <terminalRegisterCommand(__FlashStringHelper const*, void (*)(terminal::CommandContext const&))>
0x402083b6 <+18>: l32r a3, 0x4020839c
0x402083b9 <+21>: l32r a2, 0x402083a0
0x402083bc <+24>: call0 0x40213488 <terminalRegisterCommand(__FlashStringHelper const*, void (*)(terminal::CommandContext const&))>
0x402083bf <+27>: l32i a0, a1, 12
0x402083c2 <+30>: addi a1, a1, 16
0x402083c5 <+33>: ret.n
End of assembler dump.
(gdb) x (*0x4020839c)
0x40208380 <_FUN(terminal::CommandContext const&)>: 0x02f0c112
(gdb) p (const char*)(*0x402083a0)
$1 = 0x4026fa70 <_mqttInitCommands()::__pstr__> "MQTT.INFO"
(gdb) x (*0x40208398)
0x40209c70 <_FUN(terminal::CommandContext const&)>: 0x02f0c112
Also, I've noticed this patch at the gcc repo which kind of looks like the resulting code where l32r is supposed to be replaced by the movi + shift: https://github.com/earlephilhower/esp-quick-toolchain/blob/master/patches/gcc10.2/gcc-xtensa-rearrange-DI-mode-constant-loading.patch
But, without a reproducible example, idk if I'd claim this a toolchain bug just yet.
FWIW that patch also exists in the 10.3 toolchain. IIRC it was changes to reduce generated code size for some subset of constant values.
There's also https://github.com/esp8266/Arduino/discussions/8189#discussioncomment-943709 which mentioned semi-random issues, but the report got lost in the other discussion thread. There's a full build directory, too, which I grabbed and tried to compare certain pieces of code. Like, there is similar excvaddr=0x13840000
which seems to come from the similar instruction:
Reading symbols from ESP_Easy_mega_20210624_normal_300_ESP8266_4M1M.elf...
(gdb) disassemble _Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String
...
0x4021d406 <+18>: bgeu a3, a2, 0x4021d40c <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+24>
0x4021d409 <+21>: j 0x4021d538 <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+324>
0x4021d40c <+24>: movi a3, 0x4e1
0x4021d40f <+27>: slli a3, a3, 18
0x4021d412 <+30>: addx4 a2, a2, a3
0x4021d415 <+33>: l32i.n a2, a2, 0
0x4021d417 <+35>: jx a2
...
But this piece does not look like this when I try building it with a clean directory:
(gdb) disassemble _Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String
...
0x40216336 <+18>: bgeu a3, a2, 0x4021633c <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+24>
0x40216339 <+21>: j 0x40216468 <_Z11CPlugin_003N7CPlugin8FunctionEP11EventStructR6String+324>
0x4021633c <+24>: l32r a3, 0x40216310
0x4021633f <+27>: addx4 a2, a2, a3
0x40216342 <+30>: l32i.n a2, a2, 0
0x40216344 <+32>: jx a2
...
(but idk how to continue inspecting, since 0x40216310 seems to point to some place inside of the function itself and it does not seem to do so in the calculation above with movi and resolves to some pre-linking address instead)
And the symptoms sound very similar:
cc @jjsuwa-sys3175 nonetheless, maybe the above makes some sense
Another interesting thing, which really points to the toolchain. Running the following script on Windows: https://gist.github.com/mcspr/a433474aa40ab0f6433f9240455fd624#file-trying-py That generates semi-random functions that accept pointers as arguments (NOTICE file is created in the same directory as the script)
With xtensa-lx106-elf-g++ binary in PATH:
PS > py -3 .\trying.py
xtensa-lx106-elf-g++ (GCC) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[7/1024] (766)
_Z11function171v:
1018: movi a4, 0x705
1019: slli a4, a4, 16
The same script running on a WSL Linux instance does nothing all of 1024 iterations.
edit: After failing once, script has an added --existing
option to retry the build on the same test.cpp file. It can sometimes succeed, but most of the time it loses the same exact function pointer and replaces it with something random (...and it also has a different number for each compilation failure)
edit2: Script also changed to only generate a single function, and removed almost everything but the function definitions + call sites.
@mcspr thanks for quick test script
first, fetched from https://github.com/earlephilhower/esp-quick-toolchain, and locally-built on debootstraped x86_64 Debian Buster:
pi@debootstrap:/tmp/artifact$ python trying.py --cxx bin/xtensa-lx106-elf-g++
Using built-in specs.
COLLECT_GCC=bin/xtensa-lx106-elf-g++
COLLECT_LTO_WRAPPER=/tmp/artifact/libexec/gcc/xtensa-lx106-elf/10.3.0/lto-wrapper
Target: xtensa-lx106-elf
Configured with: ../configure --prefix=/tmp/artifact --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC)
[1023/1024] (745))
ok, that seems no flaws.
next, downloaded 2 pre-built toolchains from https://github.com/esp8266/Arduino, "i686-mingw32" and "x86_64-mingw32": (download locations: https://github.com/earlephilhower/esp-quick-toolchain/releases/download/3.0.4-gcc10.3/i686-w64-mingw32.xtensa-lx106-elf-1757bed.210717.zip https://github.com/earlephilhower/esp-quick-toolchain/releases/download/3.0.4-gcc10.3/x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210717.zip)
"i686-mingw32"
C:\Users\Administrator>python Z:\trying.py --cxx "C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe"
Using built-in specs.
COLLECT_GCC=C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe
COLLECT_LTO_WRAPPER=c:/program\ files\ (x86)/arduino/hardware/esp8266com/esp8266/tools/xtensa-lx106-elf/bin/../libexec/gcc/xtensa-lx106-elf/10.3.0/lto-wrapper.exe
Target: xtensa-lx106-elf
Configured with: /workdir/repo/gcc-gnu/configure --prefix=/workdir/xtensa-lx106-elf.win32 --build=x86_64-linux-gnu --host=i686-w64-mingw32 --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC)
[1023/1024] (461)
"i686-mingw32" seems no problem.
"x86_64-mingw32"
C:\Users\Administrator>python Z:\trying.py --cxx "C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe"
Using built-in specs.
COLLECT_GCC=C:\Program Files (x86)\Arduino\hardware\esp8266com\esp8266\tools\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe
COLLECT_LTO_WRAPPER=c:/program\ files\ (x86)/arduino/hardware/esp8266com/esp8266/tools/xtensa-lx106-elf/bin/../libexec/gcc/xtensa-lx106-elf/10.3.0/lto-wrapper.exe
Target: xtensa-lx106-elf
Configured with: /workdir/repo/gcc-gnu/configure --prefix=/workdir/xtensa-lx106-elf.win64 --build=x86_64-linux-gnu --host=x86_64-w64-mingw32 --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.3.0 (GCC)
[2/1024] (439)
_Z8functionv:
test.S:2066: movi a4, 0x43b
test.S:2068: slli a4, a4, 16
whoa!
what is the differences between?
we have the big question: "what's happening?" ok, we do quick'n'dirty but effective probe insertion - "printf debugging" :)
place this as "gcc-xtensa-rearrange-DI-mode-constant-loading-debug-probe.patch" on /patches/gcc10.3/. and then rebuild the toolchain (especially for "x86_64-mingw32").
diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index e56083e29..29bf38293 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1083,6 +1083,11 @@ xtensa_emit_move_sequence (rtx *operands, machine_mode mode)
{
emit_move_insn (dst, GEN_INT (srcval >> shift));
emit_insn (gen_ashlsi3_internal (dst, dst, GEN_INT (shift)));
+
+ fprintf (stderr, "[xtensa_emit_move_sequence @ %s()] '" HOST_WIDE_INT_PRINT_DEC "[" HOST_WIDE_INT_PRINT_HEX "]' => '%d[0x%x] << %d'\n",
+ current_function_name (),
+ srcval, srcval, (int)(srcval >> shift), (int)(srcval >> shift), shift);
+
return 1;
}
}
[example]
pi@debootstrap:/tmp/artifact$ cat test.c
extern int foo(unsigned int x);
int test(void) {
return foo(0x07FE0000UL);
}
pi@debootstrap:/tmp/artifact$ bin/xtensa-lx106-elf-gcc -Os -save-temps -mlongcalls -mtext-section-literals -falign-functions=4 -ffunction-sections -fdata-sections -Wall -Werror=return-type -free -fipa-pta -fno-exceptions -S test.c
[xtensa_emit_move_sequence @ test()] '134086656[0x7fe0000]' => '1023[0x3ff] << 17'
pi@debootstrap:/tmp/artifact$ cat test.s
.file "test.c"
.text
.section .text.test,"ax",@progbits
.literal_position
.align 4
.global test
.type test, @function
test:
movi a2, 0x3ff
addi sp, sp, -16
slli a2, a2, 17
s32i.n a0, sp, 12
call0 foo
l32i.n a0, sp, 12
addi sp, sp, 16
ret.n
.size test, .-test
.ident "GCC: (GNU) 10.3.0"
From the example above it's a definitive constant, but why does pointer becomes a constant thing? It feels like there's another layer broken, and the constants-loading patch just happened to detect it.
I have tried building locally, but no success so far with GCC=10.3 and a modified Makefile with gcc-gnu master build without patches. Running things inside of msys2 CLI with all pre-requisite tools installed, but it fails on patch stage even where there are zero patches related to the gcc. log.stage.patch has no errors :/ nvm, I actually missed some tools, it's building
it's building
It still fails the test above:
a433474aa40ab0f6433f9240455fd624> py -3 .\trying.py --cxx ".\x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210826\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe"
Using built-in specs.
COLLECT_GCC=.\x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210826\xtensa-lx106-elf\bin\xtensa-lx106-elf-g++.exe
COLLECT_LTO_WRAPPER=c:/users/maxim/documents/gist/a433474aa40ab0f6433f9240455fd624/x86_64-w64-mingw32.xtensa-lx106-elf-1757bed.210826/xtensa-lx106-elf/bin/../libexec/gcc/xtensa-lx106-elf/12.0.0/lto-wrapper.exe
Target: xtensa-lx106-elf
Configured with: /home/builder/dev/esp-quick-toolchain/repo/gcc-gnu/configure --prefix=/home/builder/dev/esp-quick-toolchain/xtensa-lx106-elf.win64 --build=x86_64-linux-gnu --host=x86_64-w64-mingw32 --target=xtensa-lx106-elf --disable-shared --with-newlib --enable-threads=no --disable-__cxa_atexit --disable-libgomp --disable-libmudflap --disable-nls --disable-multilib --disable-bootstrap --enable-languages=c,c++ --enable-lto --enable-static=yes --disable-libstdcxx-verbose
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210825 (experimental) (GCC)
[1/1024] (287)
_Z8functionv:
test.S:1174: movi a3, 0x37b
test.S:1175: slli a3, a3, 17
I guess it's enough for an upstream gcc bug? Unless there are other things to test; build flags specific to mingw, mingw version, included 'blob' archives, etc.
btw, should the issue be moved to esp-quick-toolchain?
@mcspr thanks for pointing me here. Looks interesting to see if this indeed is a compiler/linker bug that's been bugging us for quite some time.
I was just wondering, since you started mentioning the flash string optimization, whether this is a bug that's only (mainly) manifests on Windows? Is there something really different between building on Linux and Windows? No idea if it is related at all, but I wonder why the builds made in Linux are always slightly (a few kB) larger than builds made in Windows. Could it be that builds made in Linux do align blocks differently and by doing this maybe making it harder for these issues to reproduce them?
In another issue you also mentioned a thought about some code being recompiled, but the string which was first a flash string duplicate (or partial duplicate) then was no longer present. But the objects using such a (partial) duplicate flash string may still refer to it. Have you been able to disprove this idea, or is it still a possible issue? When you suggested it, it sounded plausible.
In another issue you also mentioned a thought about some code being recompiled, but the string which was first a flash string duplicate (or partial duplicate) then was no longer present. But the objects using such a (partial) duplicate flash string may still refer to it. Have you been able to disprove this idea, or is it still a possible issue? When you suggested it, it sounded plausible.
Just a speculation, first message proved to be incorrect and everything works as expected with the flash string(s) suffix merging.
With the gist above, compiler loses / replaces the real pointer with a different instruction via an optimization that is only supposed to happen on constants, which these pointers are really not. (...and very likely, just like with the case of the flash string, pointed-at data will discarded as unused so that's why it's missing?)
ref. CONSTANT_P (src)
& xtensa_tls_referenced_p(src)
reading the gcc code:
Which sometimes seem to match the address that happens here.
edit: Or, it is missing some additional condition that will check whether the referenced thing can actually be optimized? Upon further observation, generated addresses under mingw build are fairly short:
[xtensa_emit_move_sequence @ function()] '511442944[0x1e7c0000]' => '1951[0x79f] << 18'
While on the linux build it's something like '0x7f4f316d9fc0', which obviously does not fit the operand for a very long time
re. re. reading the above, should the l32r+slli conversion also check for src
's code (GET_CODE(X) == ...
) and reject 'invalid' things?
For example, it should definitely work for const_int
(example above where value is inline with the func call):
CONST_INT_P(src)
(https://code.woboq.org/gcc/gcc/rtl.h.html#770)
And also seems to result in the same thing when there is const int a = 12345; func(a);
.
In case of PSTR(...) / function ptr, it's symbol_ref
SYMBOL_REF_P(src)
Which should probably be kept as-is?
should the l32r+slli conversion also check for
src
's code (GET_CODE(X) == ...
) and reject 'invalid' things?
seems to be good workaround :)
patches/gcc10.3/gcc-xtensa-reject-symbol-refs-from-constant-synthesis.patch:
diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index e56083e29..af265b5cc 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1074,7 +1074,8 @@ xtensa_emit_move_sequence (rtx *operands, machine_mode mode)
{
/* Try to emit MOVI + SLLI sequence, that is smaller
than L32R + literal. */
- if (optimize_size && mode == SImode && register_operand (dst, mode))
+ if (optimize_size && mode == SImode && CONST_INT_P (src)
+ && register_operand (dst, mode))
{
HOST_WIDE_INT srcval = INTVAL (src);
int shift = ctz_hwi (srcval);
(edit: replaced !SYMBOL_REF_P
with CONST_INT_P
)
seems to be good workaround :)
patches/gcc10.3/gcc-xtensa-reject-symbol-refs-from-constant-synthesis.patch:
what a proper fix would be though?
also, based on the CONSTANT_P
condition it matches label_ref
, const
, const_double
and high
, but I'm not sure all of those don't have the same problem as symbols? shound't the check be for just the CONST_INT_P(src)
?
shound't the check be for just the
CONST_INT_P(src)
?
ah, that's more safer way:
-+ if (optimize_size && mode == SImode && !SYMBOL_REF_P (src) &&
-+ register_operand (dst, mode))
++ if (optimize_size && mode == SImode && CONST_INT_P (src)
++ && register_operand (dst, mode))
Also submitted at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102115
should we contact @jcmvbkbc, the Xtensa port's maintainer?
should we contact @jcmvbkbc, the Xtensa port's maintainer?
yeah, I wasn't sure what to do about that in the bugzilla interface. hope this ping is enough
@mcspr @jjsuwa-sys3175 thanks for tagging me. I agree with the fix posted here and will commit it to the master and to the gcc-11 branch.
I've also found that configuring gcc with --enable-checking=all
would have caught it immediately.
hi @jcmvbkbc: do you know the location suitable to discuss a topic of this kind (about non-configuration-specific GCC Xtensa port)? your repository (https://github.com/jcmvbkbc/gcc-xtensa)? it seems to be no newer than 8.1.0 there...
do you know the location suitable to discuss a topic of this kind (about non-configuration-specific GCC Xtensa port)?
@jjsuwa-sys3175 I don't. So far majority of issues were reported in a project-specific manner for the project that found them, mainly on project mailing lists. Perhaps reporting an issue on the gcc mailing list would work too, I can't recall anyone doing that for the xtensa target though. Regardless of how I'm notified of the issue I usually file a bugzilla entry for gcc or binutils. I understand that github issues looks simpler than gcc mailing list or gcc bugzilla.
your repository (https://github.com/jcmvbkbc/gcc-xtensa)? it seems to be no newer than 8.1.0 there...
This repo predates gcc switch to git so its history doesn't match bit-for-bit the current official gcc git tree. And github is not a part of my gcc workflow, so pushing code to the master there is not something that happens naturally. But that probably shouldn't inhibit reporting bugs there.
GCC issue is still pending, but... returning to the original question. Given the following example:
#ifdef NATIVE
#include <cstdint>
#include <cstdio>
#include <iterator>
#else
#include <Arduino.h>
#endif
void addresses(const char* const begin, const char* const end) {
::printf("%p:%p -> %.*s (%u)\n", begin, end, end - begin, begin, end - begin);
}
template <size_t Size>
void addresses(const char (&buf)[Size]) {
addresses(std::begin(buf), std::end(buf));
}
void test() {
addresses("");
addresses(","); // broken
addresses("999::");
addresses("-5:doesntmatter");
}
#ifndef NATIVE
void setup() {
Serial.begin(115200);
delay(1000);
::printf("\n\n\n");
test();
}
void loop() {
delay(100);
}
#else
int main() {
::puts("\n\n\n");
test();
}
#endif
Output is:
0x3ffe87fa:0x3ffe87fb -> (1)
0x3ffe87fb:0x3ffe87e2 -> (␚␔ (4294967271) <--- end < begin :/
0x3ffe87e2:0x3ffe87e8 -> 999:: (6)
0x3ffe87e8:0x3ffe87f8 -> -5:doesntmatter (16)
Output is:
0x3ffe87fa:0x3ffe87fb -> (1) 0x3ffe87fb:0x3ffe87e2 -> (␚␔ (4294967271) <--- end < begin :/ 0x3ffe87e2:0x3ffe87e8 -> 999:: (6) 0x3ffe87e8:0x3ffe87f8 -> -5:doesntmatter (16)
FTR, I cannot reproduce this with the mainline xtensa gcc. @mcspr Can you compile it to the assembly (using -S in the g++ command line) and post the result here?
@mcspr Can you compile it to the assembly (using -S in the g++ command line) and post the result here?
Using current esp8266 arduino 3.0.2 toolchain based on gcc10.3, with platformio as builder for the example above https://gist.github.com/mcspr/420ea92769f491485b9d73a9e319a366
main.cpp.S (command c/p from the platformio output building main.cpp.o, replaced -c with -S and removed -g)
$ ~/.platformio/packages/toolchain-xtensa/bin/xtensa-lx106-elf-g++ -o main.cpp.S -S -fno-rtti -std=gnu++17 -fno-exceptions -Os -mlongcalls -mtext-section-literals -falign-functions=4 -U__STRICT_ANSI -D_GNU_SOURCE -ffunction-sections -fdata-sections -Wall -Werror=return-type -free -fipa-pta -DPLATFORMIO=50201 -DESP8266 -DARDUINO_ARCH_ESP8266 -DARDUINO_ESP8266_WEMOS_D1MINI -DF_CPU=80000000L -Dets__ -DICACHE_FLASH -DARDUINO=10805 -DARDUINO_BOARD=\"PLATFORMIO_D1_MINI\" -DFLASHMODE_DIO -DLWIP_OPEN_SRC -DNONOSDK22x_190703=1 -DTCP_MSS=536 -DLWIP_FEATURES=1 -DLWIP_IPV6=0 -DVTABLES_IN_FLASH -DMMU_IRAM_SIZE=0x8000 -DMMU_ICACHE_SIZE=0x8000 -Isrc -I/home/runner/.platformio/packages/framework-arduinoespressif8266/tools/sdk/include -I/home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266 -I/home/runner/.platformio/packages/toolchain-xtensa/include -I/home/runner/.platformio/packages/framework-arduinoespressif8266/tools/sdk/lwip2/include -I/home/runner/.platformio/packages/framework-arduinoespressif8266/variants/d1_mini main.cpp
.file "main.cpp"
.text
.section .rodata._Z9addressesPKcS0_.str1.1,"aMS",@progbits,1
.LC0:
.string "%p:%p -> (%d)\n"
.section .text._Z9addressesPKcS0_,"ax",@progbits
.literal_position
.literal .LC1, .LC0
.align 4
.global _Z9addressesPKcS0_
.type _Z9addressesPKcS0_, @function
_Z9addressesPKcS0_:
sub a5, a3, a2
mov.n a4, a3
mov.n a3, a2
l32r a2, .LC1
addi sp, sp, -16
s32i.n a0, sp, 12
call0 printf
l32i.n a0, sp, 12
addi sp, sp, 16
ret.n
.size _Z9addressesPKcS0_, .-_Z9addressesPKcS0_
.section .rodata._Z4testv.str1.1,"aMS",@progbits,1
.LC2:
.string ""
.LC5:
.string ","
.LC8:
.string "999::"
.LC11:
.string "-5:doesntmatter"
.section .text._Z4testv,"ax",@progbits
.literal_position
.literal .LC3, .LC2+1
.literal .LC4, .LC2
.literal .LC6, .LC5+2
.literal .LC7, .LC5
.literal .LC9, .LC8+6
.literal .LC10, .LC8
.literal .LC12, .LC11+16
.literal .LC13, .LC11
.align 4
.global _Z4testv
.type _Z4testv, @function
_Z4testv:
l32r a3, .LC3
l32r a2, .LC4
addi sp, sp, -16
s32i.n a0, sp, 12
call0 _Z9addressesPKcS0_
l32r a3, .LC6
l32r a2, .LC7
call0 _Z9addressesPKcS0_
l32r a3, .LC9
l32r a2, .LC10
call0 _Z9addressesPKcS0_
l32r a3, .LC12
l32r a2, .LC13
call0 _Z9addressesPKcS0_
l32i.n a0, sp, 12
addi sp, sp, 16
ret.n
.size _Z4testv, .-_Z4testv
.section .rodata.setup.str1.1,"aMS",@progbits,1
.LC15:
.string "\n\n\n"
.section .text.setup,"ax",@progbits
.literal_position
.literal .LC14, Serial
.literal .LC16, .LC15
.align 4
.global setup
.type setup, @function
setup:
movi.n a7, 0
l32r a2, .LC14
movi a3, 0xe1
addi sp, sp, -16
movi.n a6, 1
mov.n a5, a7
movi.n a4, 0x1c
slli a3, a3, 9
s32i.n a0, sp, 12
call0 _ZN14HardwareSerial5beginEm12SerialConfig10SerialModehb
l32r a2, .LC16
call0 puts
call0 _Z4testv
l32i.n a0, sp, 12
addi sp, sp, 16
ret.n
.size setup, .-setup
.section .text.loop,"ax",@progbits
.literal_position
.align 4
.global loop
.type loop, @function
loop:
ret.n
.size loop, .-loop
.weak BUILTIN_LED
.section .rodata.BUILTIN_LED,"a"
.align 4
.type BUILTIN_LED, @object
.size BUILTIN_LED, 4
BUILTIN_LED:
.word 2
.ident "GCC: (GNU) 10.3.0"
.LC5:
.string ","
...
.literal .LC6, .LC5+2
.literal .LC7, .LC5
...
l32r a3, .LC6
l32r a2, .LC7
call0 _Z9addressesPKcS0_
So the compiler output is good,
0x40201067 <+23>: l32r a3, 0x40201048
0x4020106a <+26>: l32r a2, 0x40201044
0x4020106d <+29>: call0 0x40201020 <addresses(char const*, char const*)>
but the final binary is not. Looks like link-time relaxation issue to me.
Does adding the option -Wl,--no-relax
to the linker command line fix the issue?
but the final binary is not. Looks like link-time relaxation issue to me. Does adding the option -Wl,--no-relax to the linker command line fix the issue?
It does
Remembering not easily reproducible issue from back in April: https://gitter.im/esp8266/Arduino?at=606c9a5c0147fb05c5de1d59
I noticed these flags after the section name string in the PSTR(...) macro: https://github.com/earlephilhower/newlib-xtensa/blob/85c33ba4523820f1525aa4045cb5d7393b7b1e79/newlib/libc/sys/xtensa/sys/pgmspace.h#L44
Per https://sourceware.org/binutils/docs/as/Section.html#index-Section-Stack-3
If such offset had happened on pstr... what does that look like for const char[] arrays used across the app? Is it an actually desired behaviour since the resulting pointer would (?) not be aligned?
Some extra info, sorry for blowing the text size :)
I still have the offending .bin & .elf generated back in april and still with gcc10.2 timestamped 201223, but not the rest of the build directory including the suggested .map file and neither the (possibly broken?) sources that generated it. What I meant it looked like:
However, dumping the code with current toolchain & code: (with only change being the
namespace { ... }
surrounding the function, and looking at objdump it inlined the function into the one it is called from)And, there are some instances where the 'merging' seems to happen on strings like
F("FACTORY.RESET")
e.g. settingsInitCommands() here uses it, and terminal function containsF("RESET")
:asm code for the April build
_mqttInitCommands()
is slightly different as well, but I am not sure what to look at there forSince then, I was not really been able to reproduce this issue (reliably, or even at all, and do not have some kind of small example that could showcase the issue), but I wonder if the 'merge' flag somehow messed with the pointers. Or it is something related to gcc / binutils / different order of .o files when linking / etc., and it was silently resolved due to updates.