ClangBuiltLinux / linux

Linux kernel source tree
Other
242 stars 14 forks source link

Illegal operation with CONFIG_MARCH_Z13 #1709

Closed nathanchance closed 2 years ago

nathanchance commented 2 years ago

Fedora recently switched from CONFIG_MARCH_ZEC12 to CONFIG_MARCH_Z13 (corresponding to -march=zec12 to -march=z13):

https://src.fedoraproject.org/rpms/kernel/c/aff6e8acdaa437e9f06ef4166ca2209071223f8d

Unfortunately, this results in an "illegal operation" panic when booting in QEMU:

$ curl -LSso .config https://src.fedoraproject.org/rpms/kernel/raw/rawhide/f/kernel-s390x-fedora.config

$ make -skj"$(nproc)" ARCH=s390 CC=clang CROSS_COMPILE=s390x-linux-gnu- olddefconfig bzImage

$ .../boot-qemu.py -a s390 -k . -t 30s
...
[    1.923562] Run /init as init process
[    1.928076] illegal operation: 0001 ilc:3 [#1] SMP
[    1.928192] Modules linked in:
[    1.928297] CPU: 0 PID: 1 Comm: init Not tainted 6.0.0-rc5+ #1
[    1.928389] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[    1.928463] Krnl PSW : 0704d00180000000 000000000057ac44 (load_elf_binary+0x2e4/0x11d0)
[    1.928958]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[    1.928991] Krnl GPRS: 0000000000000001 00000000025f8000 0000000000000000 0000000000000000
[    1.929008]            0000000080000000 0000000000000000 0000000000000000 00000000041b0400
[    1.929024]            00000000043287c0 00000000036e8500 0000000000000001 0000000100000001
[    1.929040]            00000000041b0600 00000000041b0000 000000000057ac2e 000003800000bc28
[    1.929695] Krnl Code: 000000000057ac32: ec660585007e        cij     %r6,0,6,000000000057b73c
[    1.929695]            000000000057ac38: e31003400004        lg      %r1,832
[    1.929695]           #000000000057ac3e: e3001780003b        lzrf    %r0,1920(%r1)
[    1.929695]           >000000000057ac44: 50001780            st      %r0,1920(%r1)
[    1.929695]            000000000057ac48: e31003400004        lg      %r1,832
[    1.929695]            000000000057ac4e: c40800a881e9        lgrl    %r0,0000000001a8b020
[    1.929695]            000000000057ac54: e3001d480124        stg     %r0,7496(%r1)
[    1.929695]            000000000057ac5a: e31003400004        lg      %r1,832
[    1.929932] Call Trace:
[    1.929979]  [<000000000057ac44>] load_elf_binary+0x2e4/0x11d0
[    1.930032] ([<000000000057ac2e>] load_elf_binary+0x2ce/0x11d0)
[    1.930045]  [<00000000004e10fc>] exec_binprm+0x13c/0x390
[    1.930066]  [<00000000004e0ba6>] bprm_execve+0x3c6/0x480
[    1.930078]  [<00000000004e07be>] kernel_execve+0x4ce/0x4f0
[    1.930095]  [<0000000001017d28>] kernel_init+0x598/0x6c0
[    1.930111]  [<00000000001039ca>] __ret_from_fork+0x3a/0x60
[    1.930124]  [<000000000102ad1a>] ret_from_fork+0xa/0x40
[    1.930145] Last Breaking-Event-Address:
[    1.930154]  [<00000000004df7e0>] begin_new_exec+0x6c0/0x8e0
[    1.930380] ---[ end trace 0000000000000000 ]---
[    1.930726] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
...

This does not appear to be an LLVM or QEMU regression, as it can be reproduced with LLVM 14 through 16 and QEMU 6.0.0 (first release that supported s390x clang built kernels) through 7.1.0 (latest release):

https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/3042931830/jobs/4902909201 https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/3042917740/jobs/4902318661 https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/3042964560/jobs/4905063959

It can be trivially reproduced with ARCH=s390 defconfig + CONFIG_MARCH_Z13:

# Switch from CONFIG_MARCH_ZEC12 to CONFIG_MARCH_Z13
$ make -skj"$(nproc)" ARCH=s390 CC=clang CROSS_COMPILE=s390x-linux-gnu- defconfig menuconfig bzImage

$ .../boot-qemu.py -a s390 -k . -t 30s
...
[    1.675787] illegal operation: 0001 ilc:3 [#1] SMP
[    1.675888] Modules linked in:
[    1.676044] CPU: 0 PID: 59 Comm: modprobe Not tainted 6.0.0-rc5-00017-gd1221cea11fc #1
[    1.676134] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[    1.676202] Krnl PSW : 0704d00180000000 0000000000579fbc (load_elf_binary+0x31c/0x11c0)
[    1.676459]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[    1.676489] Krnl GPRS: 0000000000000001 00000000044f0000 0000000000000000 0000000000000000
[    1.676506]            8000000000000080 0000000000000000 0000000000000000 00000000035fa900
[    1.676522]            0000000000000001 000000000353ce00 0000000000000001 00000000044693c0
[    1.676538]            000000000276fa00 000000000446a000 0000000000579f9e 00000380002ebc78
[    1.677128] Krnl Code: 0000000000579fac: a7f405b8            brc     15,000000000057ab1c
[    1.677128]            0000000000579fb0: e31003400004        lg      %r1,832
[    1.677128]           #0000000000579fb6: e3001780003b        lzrf    %r0,1920(%r1)
[    1.677128]           >0000000000579fbc: 50001780            st      %r0,1920(%r1)
[    1.677128]            0000000000579fc0: e31003400004        lg      %r1,832
[    1.677128]            0000000000579fc6: c40800aa99a9        lgrl    %r0,0000000001acd318
[    1.677128]            0000000000579fcc: e3001d400124        stg     %r0,7488(%r1)
[    1.677128]            0000000000579fd2: e31003400004        lg      %r1,832
[    1.677343] Call Trace:
[    1.677564]  [<0000000000579fbc>] load_elf_binary+0x31c/0x11c0
[    1.677635] ([<0000000000579f9e>] load_elf_binary+0x2fe/0x11c0)
[    1.677652]  [<00000000004e1ae6>] bprm_execve+0x4f6/0x7b0
[    1.677671]  [<00000000004e1256>] kernel_execve+0x3b6/0x3d0
[    1.677687]  [<000000000017e1c8>] call_usermodehelper_exec_async+0x158/0x1d0
[    1.677706]  [<0000000000103a9a>] __ret_from_fork+0x3a/0x60
[    1.677719]  [<0000000000f8562a>] ret_from_fork+0xa/0x40
[    1.677748] Last Breaking-Event-Address:
[    1.677757]  [<0000000000579fa2>] load_elf_binary+0x302/0x11c0
[    1.678012] Kernel panic - not syncing: Fatal exception: panic_on_oops

I do not see any issues with GCC 12.2.1 from Fedora but it is just possible that GCC does not generate the same code as clang.

I have reported this upstream:

https://lore.kernel.org/YyC%2FJvFONhtTYjM%2F@dev-arch.thelio-3990X/

nathanchance commented 2 years ago

QEMU patch: https://lore.kernel.org/20220914105750.767697-1-borntraeger@linux.ibm.com/

nathanchance commented 2 years ago

Patch accepted: https://gitlab.com/thuth/qemu/-/commit/bae0bd819926193851f24091731fa7d761ff0fa1

nickdesaulniers commented 2 years ago

Will we need to roll our own QEMU again for CI?

nathanchance commented 2 years ago

Will we need to roll our own QEMU again for CI?

No, this issue is not fatal (as QEMU just exits, rather than hanging). I only happened to catch this because I looked back at my local QEMU boot logs. The patch is marked for QEMU stable, so it should get to us via a .1 release (although I have not seen QEMU stable updated recently, so maybe we will have to wait for 7.2.0).

nathanchance commented 2 years ago

This is now in QEMU master: https://gitlab.com/qemu-project/qemu/-/commit/131aafa7eff4aa4d747cb7113726b27394a38866

It is tagged for their stable releases; if they are still being done, we should get it in CI via that route. Otherwise, we will get it when 7.2.0 is released. Closing up for now as there is nothing else for us to do.