Ralim / IronOS

Open Source Soldering Iron firmware
https://ralim.github.io/IronOS/
GNU General Public License v3.0
7.21k stars 713 forks source link

Multi-lang builds of Pinecilv2 fail with ld/lto1 errors #1764

Closed ia closed 1 year ago

ia commented 1 year ago

Describe the bug Multi-lang builds for Pinecilv2 fail with ld/lto1 errors.

To Reproduce

$ cat test.sh
#!/usr/bin/env bash

set -x
set -e
while [ 1 -eq 1 ]; do
    make clean-build
    make  -j2  model=Pinecilv2  firmware-multi_compressed_European  firmware-multi_compressed_Bulgarian+Russian+Serbian+Ukrainian  firmware-multi_Chinese+Japanese
done;
$ make docker-shell
# ./test.sh

Expected behavior Successful multi-lang builds for PinecilV2.

Details of your device: Build problem, not a device one.

Additional context

I create this issue to:

If you work on your branch in forked repo and see similar problem, please, add comment providing:

Here are the examples of this issue:

lto1 error / upstream:

lto1: internal compiler error: cannot read 'LTO_section_decls' from Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/drivers/bl702_driver/hal_drv/src/hal_sec_hash.o
0xde08d0 internal_error(char const*, ...)
    ???:0
0x5d8040 read_cgraph_and_symbols(unsigned int, char const**)
    ???:0
0x5caf11 lto_main()
    ???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: riscv-none-elf-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:844: Hexfile/Pinecilv2_multi_compressed_Bulgarian+Russian+Serbian+Ukrainian.elf] Error 1
make[1]: Leaving directory '/__w/IronOS/IronOS/source'
make: *** [Makefile:162: firmware-multi_compressed_Bulgarian+Russian+Serbian+Ukrainian] Error 2
make: *** Waiting for unfinished jobs....
Linking Hexfile/Pinecilv2_multi_compressed_European.elf
lto1: internal compiler error: cannot read 'LTO_section_decls' from Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/drivers/bl702_driver/hal_drv/src/hal_sec_hash.o
0xde08d0 internal_error(char const*, ...)
    ???:0
0x5d8040 read_cgraph_and_symbols(unsigned int, char const**)
    ???:0
0x5caf11 lto_main()
    ???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: riscv-none-elf-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:844: Hexfile/Pinecilv2_multi_compressed_European.elf] Error 1
make[1]: Leaving directory '/__w/IronOS/IronOS/source'
make: *** [Makefile:162: firmware-multi_compressed_European] Error 2
Error: Process completed with exit code 2.

lto1 error / branch:

Linking Hexfile/Pinecilv2_multi_compressed_European.elf
lto1: internal compiler error: in read_cgraph_and_symbols, at lto/lto-common.c:2739
0xde08d0 internal_error(char const*, ...)
    ???:0
0x5a5ae1 fancy_abort(char const*, int, char const*)
    ???:0
0x5d77f4 read_cgraph_and_symbols(unsigned int, char const**)
    ???:0
0x5caf11 lto_main()
    ???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: riscv-none-elf-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:844: Hexfile/Pinecilv2_multi_compressed_European.elf] Error 1
make[1]: Leaving directory '/__w/IronOS/IronOS/source'
make: *** [Makefile:162: firmware-multi_compressed_European] Error 2
Error: Process completed with exit code 2.

ld error / branch:

Linking Hexfile/Pinecilv2_multi_compressed_European.elf
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: warning: Objects/Pinecilv2/./Core/Threads/OperatingModes/USBPDDebug_HUSB238.o has a section extending past end of file
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: Objects/Pinecilv2/./Core/Threads/OperatingModes/USBPDDebug_HUSB238.o: ELF section name out of range
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:844: Hexfile/Pinecilv2_multi_compressed_European.elf] Error 1
make[1]: Leaving directory '/__w/IronOS/IronOS/source'
make: *** [Makefile:162: firmware-multi_compressed_European] Error 2
Error: Process completed with exit code 2.

lto1 error / branch:

lto1: internal compiler error: in read_cgraph_and_symbols, at lto/lto-common.c:2739
0xde08d0 internal_error(char const*, ...)
    ???:0
0x5a5ae1 fancy_abort(char const*, int, char const*)
    ???:0
0x5d77f4 read_cgraph_and_symbols(unsigned int, char const**)
    ???:0
0x5caf11 lto_main()
    ???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: riscv-none-elf-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:846: Hexfile/Pinecil_multi_compressed_European.elf] Error 1
make[1]: Leaving directory '/__w/IronOS-plus/IronOS-plus/source'
make: *** [Makefile:162: firmware-multi_compressed_European] Error 2
Error: Process completed with exit code 2.

At first, I couldn't reproduce it locally, not without -j$(nproc) at all nor with -j4 (since it's the value of nproc on my system). But when I did put -j2 which seems the case with github CI, I got interesting result almost right away:

Linking Hexfile/Pinecilv2_multi_compressed_Bulgarian+Russian+Serbian+Ukrainian.elf
Generating Objects/Pinecilv2/Core/Gen/translation.files/multi.EUR.o
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: warning: Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/components/ble/ble_stack/sbc/enc/sbc_analysis.o has a section extending past end of file
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/components/ble/ble_stack/sbc/enc/sbc_analysis.o: ELF section name out of range
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:846: Hexfile/Pinecilv2_multi_compressed_Bulgarian+Russian+Serbian+Ukrainian.elf] Error 1
make[1]: Leaving directory '/build/ironos/source'
make: *** [Makefile:162: firmware-multi_compressed_Bulgarian+Russian+Serbian+Ukrainian] Error 2
make: *** Waiting for unfinished jobs....
Generating BriefLZ compressed translation for multi-language European
INFO:root:Reading pickled language data from Objects/Pinecilv2/Core/Gen/translation.files/multi.EUR.pickle...
INFO:root:Read language data for ['EN', 'CS', 'DA', 'DE', 'ES', 'FI', 'FR', 'HR', 'HU', 'IT', 'LT', 'NL', 'NL_BE', 'NB', 'PL', 'PT', 'SK', 'SL', 'SV', 'TR', 'VI']
INFO:root:Build version: v2.22B.55D36C98
INFO:root:Generating block for ['EN', 'CS', 'DA', 'DE', 'ES', 'FI', 'FR', 'HR', 'HU', 'IT', 'LT', 'NL', 'NL_BE', 'NB', 'PL', 'PT', 'SK', 'SL', 'SV', 'TR', 'VI']
INFO:root:Font table 12x16 compressed from 3672 to 1528 bytes (ratio 0.416)
INFO:root:Font table 06x08 compressed from 798 to 739 bytes (ratio 0.926)
INFO:root:Strings for EN compressed from 3232 to 2339 bytes (ratio 0.724)
INFO:root:Strings for CS compressed from 3418 to 2525 bytes (ratio 0.739)
INFO:root:Strings for DA compressed from 3396 to 2530 bytes (ratio 0.745)
INFO:root:Strings for DE compressed from 3526 to 2441 bytes (ratio 0.692)
INFO:root:Strings for ES compressed from 3806 to 2559 bytes (ratio 0.672)
INFO:root:Strings for FI compressed from 3310 to 2594 bytes (ratio 0.784)
INFO:root:Strings for FR compressed from 3758 to 2510 bytes (ratio 0.668)
INFO:root:Strings for HR compressed from 3972 to 2772 bytes (ratio 0.698)
INFO:root:Strings for HU compressed from 3510 to 2513 bytes (ratio 0.716)
INFO:root:Strings for IT compressed from 4304 to 2587 bytes (ratio 0.601)
INFO:root:Strings for LT compressed from 3626 to 2710 bytes (ratio 0.747)
INFO:root:Strings for NL compressed from 3626 to 2574 bytes (ratio 0.71)
INFO:root:Strings for NL_BE compressed from 3340 to 2553 bytes (ratio 0.764)
INFO:root:Strings for NB compressed from 3194 to 2407 bytes (ratio 0.754)
INFO:root:Strings for PL compressed from 3844 to 2762 bytes (ratio 0.719)
INFO:root:Strings for PT compressed from 3394 to 2441 bytes (ratio 0.719)
INFO:root:Strings for SK compressed from 3454 to 2648 bytes (ratio 0.767)
INFO:root:Strings for SL compressed from 3172 to 2541 bytes (ratio 0.801)
INFO:root:Strings for SV compressed from 3206 to 2516 bytes (ratio 0.785)
INFO:root:Strings for TR compressed from 3160 to 2608 bytes (ratio 0.825)
INFO:root:Strings for VI compressed from 3300 to 2344 bytes (ratio 0.71)
INFO:root:Done
Linking Hexfile/Pinecilv2_multi_compressed_European.elf
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: warning: Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/components/ble/ble_stack/sbc/enc/sbc_analysis.o has a section extending past end of file
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/components/ble/ble_stack/sbc/enc/sbc_analysis.o: ELF section name out of range
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:846: Hexfile/Pinecilv2_multi_compressed_European.elf] Error 1
make[1]: Leaving directory '/build/ironos/source'
make: *** [Makefile:162: firmware-multi_compressed_European] Error 2

Binary files mentioned in the log above can be found here.

My further plan is to:

My current suspicious that it probably may be somehow related to parallel building creating race condition-like situation (i.e. some binary file is not fully generated yet when some related dependency in a target inside Makefile "thinks" that it's ready.

Less (but not impossible BTW) it could be a bug in the toolchain.

ia commented 1 year ago

Rate of reproducing this issue locally - about 99%. But sometimes it's slightly different error every next time:

Linking Hexfile/Pinecilv2_multi_compressed_European.elf
lto1: fatal error: bytecode stream in file 'Objects/Pinecilv2/./Core/BSP/Pinecilv2/bl_mcu_sdk/common/partition/partition.o' generated with GCC compiler older than 10.0
compilation terminated.
lto-wrapper: fatal error: riscv-none-elf-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/riscv-none-elf/11.2.0/../../../../riscv-none-elf/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:846: Hexfile/Pinecilv2_multi_compressed_European.elf] Error 1
make[1]: Leaving directory '/build/ironos/source'
make: *** [Makefile:162: firmware-multi_compressed_European] Error 2

And it seems my fault after all, sorry! :no_mouth:


TL; DR - probable root cause (mini write-up or today I learned):


I looked through logs before/after changes in push.yml very carefully and noticed that by the way how the logs report building, it seems that in the scenario with cd source && make -j2 multi_... OR in the scenario with make -C source/ (tested locally) option for parallel build -jN if not ignored but applied inside of making every input target in sequential order (one output of Building for Pine64 Pinecilv2 line because make process is the only one).

While after the changes in push.yml targets themselves are run in parallel (two outputs of Building for Pine64 Pinecilv2 line in parallel because make process forked into nproc processes of itself and they started to compile target firmware-multi_compressed_European & target firmware-multi_compressed_Bulgarian+Russian+Serbian+Ukrainian in parallel, hence conflict of binary data in files generated/overwritten leading to compilation error).

Working on a fix now...

ia commented 1 year ago

I just added -C source/ to test.sh from the original report and after that I got more than a dozen successful cycles of building. PR is ready here. I had zero idea about such nuances of behavior of make BTW.

Ralim commented 1 year ago

Neither did I so I never caught it 😓 Thanks for getting a fix figured out before I woke up; very nice to wake up to a fixed issue 😁

ia commented 1 year ago

Thanks for getting a fix figured out before I woke up; very nice to wake up to a fixed issue

Sure, no problem! Sorry to bring this bug in the first place to the repo. :|

And I could be wrong in the terminology in root cause part but the bottom line is - as far as I could understand:

Something like that as far as I did manage to figure out this in a brief only to fix the issue in the most fast & suitable way.

Ralim commented 1 year ago

This does make sense, but also makes it hairy to debug 😓