Ralim / IronOS

Open Source Soldering Iron firmware
https://ralim.github.io/IronOS/
GNU General Public License v3.0
7.21k stars 713 forks source link

Linker error while compiling TS80P from master #685

Closed doegox closed 4 years ago

doegox commented 4 years ago

Bug report:

Trying to compile TS80P flavor from current master (https://github.com/Ralim/ts100/commit/372f8e35654694b77e0c3e754e3e945a5fe23410) results in a linker error.

./build.sh -l EN -m TS80P
...
Cleaning previous builds
    [Success]
*********************************************
Building firmware for TS80P in EN
/usr/lib/gcc/arm-none-eabi/8.3.1/../../../arm-none-eabi/bin/ld: /tmp/ccdZ1zr9.ltrans1.ltrans.o: in function `pxCurrentTCBConst':
/home/qb/00hardware_guilde/haqerspace/tools/ts100/firmwares/ralim_fw/git/workspace/TS100/./Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM3/port.c:393: undefined reference to `pxCurrentTCB'
/usr/lib/gcc/arm-none-eabi/8.3.1/../../../arm-none-eabi/bin/ld: /tmp/ccdZ1zr9.ltrans1.ltrans.o: in function `pxCurrentTCBConst2':
/home/qb/00hardware_guilde/haqerspace/tools/ts100/firmwares/ralim_fw/git/workspace/TS100/./Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM3/port.c:219: undefined reference to `pxCurrentTCB'
collect2: error: ld returned 1 exit status
make: *** [Makefile:224: Hexfile/TS80P_EN.elf] Error 1
    [Error]
*********************************************
 -- Stop on error --

I'm using arm-none-eabi/8.3.1 on a Debian and ld is from binutils-arm-none-eabi 2.34-4+14 Running the Docker (docker-compose run --rm builder /bin/bash /build/ci/buildAll.sh) results in the same error on TS80P. Both local and Docker compile fine the TS100 and TS80 flavors.

I read on forums that this undefined reference to pxCurrentTCB might be linked to the -flto option, so I tried to remove it as well. In that case, TS100 and TS80 still compile fine, while TS80P fails with

/usr/lib/gcc/arm-none-eabi/8.3.1/../../../arm-none-eabi/bin/ld:LinkerScript.ld:121 cannot move location counter backwards (from 000000000800f80c to 000000000800f800)
collect2: error: ld returned 1 exit status

which might help locating the error, or not. So far I don't know how to track further this bug.

doegox commented 4 years ago

From what I understand, the compiled code is 12b too large. Commenting random fct in the code to reduce size results in successful compilation.

doegox commented 4 years ago

Hmm well the missing 12b were without -flto. I finally managed to compile the firmware with

diff --git a/workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/tasks.c b/workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/tasks.c
index f93fca0..b72f886 100644
--- a/workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/tasks.c
+++ b/workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/tasks.c
@@ -334,6 +334,7 @@ typedef tskTCB TCB_t;

 /*lint -save -e956 A manual analysis and inspection has been used to determine
 which static variables must be declared volatile. */
+__attribute__ ((used))
 PRIVILEGED_DATA TCB_t * volatile pxCurrentTCB = NULL;

 /* Lists for ready and blocked tasks. --------------------
paulfertser commented 4 years ago

I wonder why I can't reproduce it here with arm-none-eabi toolchain from Debian testing:

Linking TS80P_EN.elf
arm-none-eabi-size Hexfile/TS80P_EN.elf
   text    data     bss     dec     hex filename
  39884     476   26436   66796   104ec Hexfile/TS80P_EN.elf
arm-none-eabi-objcopy Hexfile/TS80P_EN.elf -O ihex Hexfile/TS80P_EN.hex
arm-none-eabi-size Hexfile/TS80P_EN.elf
   text    data     bss     dec     hex filename
  39884     476   26436   66796   104ec Hexfile/TS80P_EN.elf
arm-none-eabi-objcopy Hexfile/TS80P_EN.elf -O binary Hexfile/TS80P_EN.bin
paul@home:~/tmp/ts100/workspace/TS100$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (15:8-2019-q3-1) 8.3.1 20190703 (release) [gcc-8-branch revision 273027]
...
Ralim commented 4 years ago

I also havent run into this issue here either, that said I always use the binary directly from ARM (See the dockerfile for the version the release firmwares use).

@doegox Is it possible for you to test against that version using the docker image to narrow down the issue?

doegox commented 4 years ago

Well as I said in the initial comment, I tried already the Docker image itself with docker-compose run --rm builder /bin/bash /build/ci/buildAll.sh, resulting in the same error. Googling around, it seems that error on pxCurrentTCB occurs occasionally on various FreeRTOS projects, apparently due to LTO being too aggressive at removing "unused" code, and the solution is to flag pxCurrentTCB as need to keep with __attribute__ ((used)). Why this only happens to me, including in the Docker, is a mystery...

paulfertser commented 4 years ago

I think I know where the problem is, here's what I have on my machine:

paul@home:~/tmp/ts100/workspace/TS100$ find . -type f -name '*.c'
./Middlewares/Third_Party/FreeRTOS/Source/list.c
./Middlewares/Third_Party/FreeRTOS/Source/tasks.c
./Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM3/port.c
./Middlewares/Third_Party/FreeRTOS/Source/croutine.c
./Middlewares/Third_Party/FreeRTOS/Source/event_groups.c

When I change the order of port.c and tasks.c (putting port.c first) I get the same error. So the difference is in the underlying filesystem (I'm using ext4) rather than something in the toolchain. I have to think and to read and to experiment to find the best way to fix it.

doegox commented 4 years ago

wow good finding. This would explain difference in LTO process. my list:

./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS/cmsis_os.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM3/port.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/list.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/croutine.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/timers.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/queue.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/tasks.c
./workspace/TS100/Middlewares/Third_Party/FreeRTOS/Source/event_groups.c
Ralim commented 4 years ago

@doegox I'm more than happy to add the __attribute__ ((used)) marker if this would make compiling easier.

This feels very crazy, but there have been quirks with lto in the past so im not amazingly surpised either.

Kudos @paulfertser for figuring that one out ❤️

paulfertser commented 4 years ago

@doegox , please see if it really fixes the issue on your machine. What filesystem your sources are stored on btw?

doegox commented 4 years ago

Yes @paulfertser your fix works on my platform! FTR I'm also using EXT4