cortex_m: Build failures with LLVM

chrysn commented 4 years ago

Description

Builds with nrf52 fail at several places when run with TOOLCHAIN=llvm, both with default clang (version 9 on my machine) and clang-10

Steps to reproduce the issue

$ make -C examples/gcoap TOOLCHAIN=clang BOARD=nrf52840dongle all
[...]
clang \
        -DRIOT_FILE_RELATIVE=\"cpu/cortexm_common/thread_arch.c\" \
        -DRIOT_FILE_NOPATH=\"thread_arch.c\" \
        -DDEVELHELP -Werror -DCPU_FAM_NRF52 -mcpu=cortex-m4 -mlittle-endian -mthumb  -ffunction-sections -fdata-sections -fno-builtin -fshort-enums -ggdb -g3 -Os -DCPU_MODEL_NRF52840XXAA -DCPU_CORE_CORTEX_M4F -target arm-none-eabi -Wno-atomic-alignment -Wno-unknown-warning-option -DRIOT_APPLICATION=\"gcoap_example\" -DBOARD_NRF52840DONGLE=\"nrf52840dongle\" -DRIOT_BOARD=BOARD_NRF52840DONGLE -DCPU_NRF52=\"nrf52\" -DRIOT_CPU=CPU_NRF52 -DMCU_NRF52=\"nrf52\" -DRIOT_MCU=MCU_NRF52 -std=c99 -fno-common -Wall -Wextra -Wmissing-include-dirs -fdiagnostics-color -Wstrict-prototypes -Wold-style-definition -gz -Wformat=2 -DSOCK_HAS_IPV6 -DSOCK_HAS_ASYNC -DSOCK_HAS_ASYNC -DSOCK_HAS_ASYNC_CTX -include '/home/chrysn/git/RIOT/examples/gcoap/bin/nrf52840dongle/riotbuild/riotbuild.h' -DCONFIG_GCOAP_RESEND_BUFS_MAX=2  -isystem /usr/include/newlib/nano -isystem /usr/include/newlib -nostdinc -I/home/chrysn/git/RIOT/core/include -I/home/chrysn/git/RIOT/drivers/include -I/home/chrysn/git/RIOT/sys/include -I/home/chrysn/git/RIOT/boards/nrf52840dongle/include -I/home/chrysn/git/RIOT/boards/common/nrf52/include -I/home/chrysn/git/RIOT/cpu/nrf52/include -I/home/chrysn/git/RIOT/cpu/nrf5x_common/include -I/home/chrysn/git/RIOT/cpu/cortexm_common/include -I/home/chrysn/git/RIOT/cpu/cortexm_common/include/vendor -isystem /usr/lib/gcc/arm-none-eabi/8.3.1/include -isystem /usr/lib/gcc/arm-none-eabi/8.3.1/include-fixed -isystem /usr/include/newlib -I/home/chrysn/git/RIOT/sys/libc/include -I/home/chrysn/git/RIOT/sys/net/gnrc/network_layer/sixlowpan/frag -I/home/chrysn/git/RIOT/sys/net/gnrc/sock/include -I/home/chrysn/git/RIOT/sys/posix/include -I/home/chrysn/git/RIOT/sys/net/link_layer/eui_provider/include -I/home/chrysn/git/RIOT/sys/net/sock/async/event -MQ '/home/chrysn/git/RIOT/examples/gcoap/bin/nrf52840dongle/cortexm_common/thread_arch.o' -MD -MP -c -o /home/chrysn/git/RIOT/examples/gcoap/bin/nrf52840dongle/cortexm_common/thread_arch.o /home/chrysn/git/RIOT/cpu/cortexm_common/thread_arch.c
/home/chrysn/git/RIOT/cpu/cortexm_common/thread_arch.c:325:6: error: instruction requires: fp registers
    "vstmdbeq r0!, {s16-s31}          \n" /* save FPU registers if FPU is used */
     ^
<inline asm>:9:1: note: instantiated into assembly here

Expected results

Builds with LLVM should (be tested to) work just as with GCC.

Versions

$ make print-versions
Operating System Environment
----------------------------
         Operating System: "Debian GNU/Linux" 
                   Kernel: Linux 5.6.0-1-amd64 x86_64 unknown
             System shell: /bin/dash (probably dash)
             make's shell: /bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Debian 10.1.0-6) 10.1.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (15:8-2019-q3-1) 8.3.1 20190703 (release) [gcc-8-branch revision 273027]
                  avr-gcc: missing
         mips-mti-elf-gcc: missing
           msp430-elf-gcc: missing
     riscv-none-embed-gcc: missing
     xtensa-esp32-elf-gcc: missing
   xtensa-esp8266-elf-gcc: missing
                    clang: clang version 9.0.1-14 

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "3.3.0"
      mips-mti-elf-newlib: missing
        msp430-elf-newlib: missing
  riscv-none-embed-newlib: missing
  xtensa-esp32-elf-newlib: missing
xtensa-esp8266-elf-newlib: missing
                 avr-libc: missing (missing)

Installed development tools
---------------------------
                   ccache: ccache version 3.7.10
                    cmake: cmake version 3.16.3
                 cppcheck: Cppcheck 2.1
                  doxygen: 1.8.17
                      git: git version 2.28.0.rc1
                     make: GNU Make 4.3
                  openocd: Open On-Chip Debugger 0.10.0
                   python: Python 2.7.18
                  python2: Python 2.7.18
                  python3: Python 3.8.5
                   flake8: 3.8.3 (mccabe: 0.6.1, pycodestyle: 2.6.0, pyflakes: 2.2.0) CPython 3.8.5 on
               coccinelle: missing

Further experimentation

I'm seeing similar results with other boards, eg. nucleo-l496zg (also both with clang implicityly 9 and clang 10, also clang 8).

kaspar030 commented 4 years ago

hm, llvm is built for a subset of nodes, for performance reasons. examples/gcoap is built for nrf52dk (here).

I tried to reproduce this locally, it builds fine with llvm 10, apart from some poweroff not used in the usb code.

Is this on master?

chrysn commented 4 years ago

This is on master, yes -- are you on Debian as well? (They enable additional features sometime).

chrysn commented 4 years ago

Things do start working once I undef'd MODULE_CORTEXM_FPU. The chip in question (nrf52840) does have an FPU according to its spec.

Drilling down further, I found that makefiles/arch/cortexm.inc.mk does not set any CFLAGS_FPU on llvm -- probably something that was OK for some time when things would then just not the FPU, but now that there's FPU-related assembly in there, this must be coordinated better.

I tried just removing the llvm guard (hoping that whatever it was that LLVM couldn't do back than it learned), and things now build with -mcpu=cortex-m4 -mlittle-endian -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 all the way through.

The comments around the guards suggest Clang does FPU all the time (# clang assumes there is an FPU, no CFLAGS necessary); possibly that assumption just does not hold any more. Given that even clang 8 (on my system) accepts the additional flags, I don't see immediate harm in removing those conditions. (It'd still be interesting to see why the discrepancy comes from, though.)

kaspar030 commented 4 years ago

This is on master, yes -- are you on Debian as well? (They enable additional features sometime).

I'm on arch, so probably using default llvm flags.

chrysn commented 4 years ago

So the minimal test case I came up with is this:

test.c:

int main() {
    __asm__ volatile (
    "it     eq                        \n"
    "vstmdbeq r0!, {s16-s31}          \n" /* save FPU registers if FPU is used */
    );
}

$ clang -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb test.c
test.c:4:6: error: instruction requires: fp registers
    "vstmdbeq r0!, {s16-s31}          \n" /* save FPU registers if FPU is used */
     ^
<inline asm>:2:1: note: instantiated into assembly here
vstmdbeq r0!, {s16-s31}          
^
1 error generated.

$ clang -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 test.c
[ other errors happening later at linking stage ]

Do those generate the same linking-stage errors at your end?

kaspar030 commented 4 years ago

[kaspar@ng ~/tmp]$ cat tst.c 
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: tst.c
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ int main() {
   2   │     __asm__ volatile (
   3   │     "it     eq                        \n"
   4   │     "vstmdbeq r0!, {s16-s31}          \n" /* save FPU registers if FPU is used */
   5   │     );
   6   │ }
───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
[kaspar@ng ~/tmp]$ clang -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb tst.c
ld.lld: error: unable to find library -lc
ld.lld: error: unable to find library -lm
ld.lld: error: unable to find library -lclang_rt.builtins-arm.a
clang-10: error: ld.lld command failed with exit code 1 (use -v to see invocation)
[kaspar@ng ~/tmp]$ clang -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 tst.c
ld.lld: error: unable to find library -lc
ld.lld: error: unable to find library -lm
ld.lld: error: unable to find library -lclang_rt.builtins-arm.a
clang-10: error: ld.lld command failed with exit code 1 (use -v to see invocation)
[kaspar@ng ~/tmp]$ clang --version
clang version 10.0.1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
[kaspar@ng ~/tmp]$

kaspar030 commented 4 years ago

(sorry, using bat instead of cat)

chrysn commented 4 years ago

That does sound like you get to the later stage with both versions. (I don't get the very same errors, but the error for the full version at my end is sufficiently close to yours:

clang: error: unable to execute command: Executable "ld.lld" doesn't exist!
clang: error: ld.lld command failed with exit code 1 (use -v to see invocation)

).

So maybe the -v flag can shed more light -- I get:

in hindsight, probably not -- but here's the details anyway

```shell $ clang-10 -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb test.c -v Debian clang version 10.0.1-5 Target: arm-none-unknown-eabi Thread model: posix InstalledDir: /usr/bin "/usr/lib/llvm-10/bin/clang" -cc1 -triple thumbv7em-none-unknown-eabi -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mframe-pointer=all -fmath-errno -fno-rounding-math -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m4 -target-feature +soft-float -target-feature +soft-float-abi -target-feature -crc -target-feature -sha2 -target-feature -aes -target-feature +dsp -target-feature -ras -target-feature -sb -target-feature -lob -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature -vfp2 -target-feature -vfp2sp -target-feature -vfp3 -target-feature -vfp3d16 -target-feature -vfp3d16sp -target-feature -vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature -fp64 -target-feature -d32 -target-feature -neon -target-feature -crypto -target-feature -dotprod -target-feature -fp16fml -target-feature -mve -target-feature -mve.fp -target-feature -fpregs -target-feature +strict-align -target-abi aapcs -msoft-float -mfloat-abi soft -fallow-half-arguments-and-returns -dwarf-column-info -fno-split-dwarf-inlining -debugger-tuning=gdb -v -resource-dir /usr/lib/llvm-10/lib/clang/10.0.1 -internal-isystem /usr/lib/llvm-10/lib/clang/10.0.1/include -internal-isystem include -fdebug-compilation-dir /tmp -ferror-limit 19 -fmessage-length 0 -fno-signed-char -fgnuc-version=4.2.1 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -faddrsig -o /tmp/test-f7e46b.o -x c test.c clang -cc1 version 10.0.1 based upon LLVM 10.0.1 default target x86_64-pc-linux-gnu ignoring nonexistent directory "include" ignoring duplicate directory "/usr/lib/llvm-10/lib/clang/10.0.1/include" #include "..." search starts here: #include <...> search starts here: /usr/lib/llvm-10/lib/clang/10.0.1/include End of search list. test.c:4:6: error: instruction requires: fp registers "vstmdbeq r0!, {s16-s31} \n" /* save FPU registers if FPU is used */ ^ :2:1: note: instantiated into assembly here vstmdbeq r0!, {s16-s31} ^ 1 error generated. $ clang-10 -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 test.c -v Debian clang version 10.0.1-5 Target: arm-none-unknown-eabi Thread model: posix InstalledDir: /usr/bin "/usr/lib/llvm-10/bin/clang" -cc1 -triple thumbv7em-none-unknown-eabi -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.c -mrelocation-model static -mthread-model posix -mframe-pointer=all -fmath-errno -fno-rounding-math -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m4 -target-feature -crc -target-feature -sha2 -target-feature -aes -target-feature -dotprod -target-feature +dsp -target-feature -mve -target-feature -mve.fp -target-feature -ras -target-feature -sb -target-feature -lob -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature -vfp2 -target-feature +vfp2sp -target-feature -vfp3 -target-feature -vfp3d16 -target-feature +vfp3d16sp -target-feature -vfp3sp -target-feature +fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature +vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature -fp64 -target-feature -d32 -target-feature -neon -target-feature -crypto -target-feature -fp16fml -target-feature +strict-align -target-abi aapcs -mfloat-abi hard -fallow-half-arguments-and-returns -dwarf-column-info -fno-split-dwarf-inlining -debugger-tuning=gdb -v -resource-dir /usr/lib/llvm-10/lib/clang/10.0.1 -internal-isystem /usr/lib/llvm-10/lib/clang/10.0.1/include -internal-isystem include -fdebug-compilation-dir /tmp -ferror-limit 19 -fmessage-length 0 -fno-signed-char -fgnuc-version=4.2.1 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -faddrsig -o /tmp/test-c0bde8.o -x c test.c clang -cc1 version 10.0.1 based upon LLVM 10.0.1 default target x86_64-pc-linux-gnu ignoring nonexistent directory "include" ignoring duplicate directory "/usr/lib/llvm-10/lib/clang/10.0.1/include" #include "..." search starts here: #include <...> search starts here: /usr/lib/llvm-10/lib/clang/10.0.1/include End of search list. "ld.lld" /tmp/test-c0bde8.o -Bstatic -L/usr/lib/llvm-10/lib/clang/10.0.1/lib/baremetal -lc -lm -lclang_rt.builtins-arm.a -o a.out clang: error: unable to execute command: Executable "ld.lld" doesn't exist! clang: error: ld.lld command failed with exit code 1 (use -v to see invocation) ```

In all the argument soup, it stands out that without explicit configuration, my clang 10 goes to soft float mode. If you run clang -target arm-none-eabi -mcpu=cortex-m4 -mlittle-endian -mthumb -mfloat-abi=soft tst.c, do you run into the same "instruction requires: fp registers" error I do for the implicit case (indicating we have different defaults), or into the "error: invalid float ABI" that's expected if your compiler doesn't know what float-abi=soft means?

chrysn commented 4 years ago

I've just given this another test with clang-7 on Debian stretch's backports -- same behavior as with my regular system, clang wants float flags to accept float assembly instructions.

Godbolt, on the other hand, [shows what you see](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,j:1,lang:___c,selection:(endColumn:2,endLineNumber:6,positionColumn:2,positionLineNumber:6,selectionStartColumn:2,selectionStartLineNumber:6,startColumn:2,startLineNumber:6),source:'int+main()+%7B%0A++++__asm__+volatile+(%0A++++%22it+++++eq++++++++++++++++++++++++%5Cn%22%0A++++%22vstmdbeq+r0!!,+%7Bs16-s31%7D++++++++++%5Cn%22+/*+save+FPU+registers+if+FPU+is+used+*/%0A++++)%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:armv7-cclang-10,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'1',trim:'1'),fontScale:14,j:1,lang:___c,libs:!(),options:'-target+arm-none-eabi+-mcpu%3Dcortex-m4+-mlittle-endian+-mthumb',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'armv7-a+clang+10.0.0+(Editor+%231,+Compiler+%231)+C',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4), that hard float appears to be the default.

From inspecting Debian sources, there is a patch that does change the default behavior. As that gives us different defaults on different platforms, the best way forward I see is to explicitly pass the flags -- it does not seem that they do any harm when set.

PR to follow.

RIOT-OS / RIOT