ClangBuiltLinux / continuous-integration

Continuous integration of latest Linux kernel with daily build of Clang & LLVM tools
https://travis-ci.com/ClangBuiltLinux/continuous-integration
Apache License 2.0
44 stars 18 forks source link

i386 #182

Closed tpimh closed 4 years ago

tpimh commented 5 years ago

This is WIP. Currently, lacking buildroot image (will add it when the kernel is able to boot), probably shouldn't be merged until https://github.com/ClangBuiltLinux/linux/issues/3 is fixed.

UPD: rootfs added, booting fine with GNU ld, but not LLD (UPD2: fixed). After these issues are fixed, we can proceed with this PR:

nickdesaulniers commented 5 years ago

https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/211110274 shows that you're hitting what looks slightly similar to https://github.com/ClangBuiltLinux/linux/issues/186.

ld: arch/x86/entry/vsyscall/vsyscall_gtod.o: in function `update_vsyscall':
vsyscall_gtod.c:(.text+0x262): undefined reference to `__udivdi3'

If the Makefile in arch/x86/entry/vsyscall/ does not have -Oz, then looks like we have another issue to deal with.

nathanchance commented 5 years ago

I was able to bisect the problematic commit as well: https://github.com/ClangBuiltLinux/linux/issues/186#issuecomment-430780405

nickdesaulniers commented 5 years ago

(I'll bet the sub expression tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift involves 64b numbers).

tpimh commented 5 years ago

Should ClangBuiltLinux/linux#186 be reopenned then?

nathanchance commented 5 years ago

A new issue should probably be opened.

tpimh commented 5 years ago

The patch to revert vgtod_ts commits is huge and most of it is unnecessary, but it works. Now there is also a problem with LLD, I will try to fix it later (or at least create an issue for it as I can't find one).

nathanchance commented 5 years ago

Tentatively seems good.

I assume those two patches are going to be upstreamed at some point? Given that we just got all of our LLVM 9 targets building patch free, it's a little sad to add some back.

nathanchance commented 5 years ago

It also looks like your percpu patch needs to be rebased?

nickdesaulniers commented 5 years ago

make sure to remove the WIP label when it's ready for code review

tpimh commented 5 years ago

The percpu patch is easy to fix so it builds, but this is not a proper fix, I'll check how similar issue was fixed with x86_64. The other patch should probably be rewritten as well. After these patches are upstreamed, i386 can be merged to master.

tpimh commented 4 years ago

Rebased with master and surprisingly it's broken. Builds fine, but doesn't boot. I will try to build 4.19 to see if it works.

nickdesaulniers commented 4 years ago

I recommend:

  1. commit the buildroot changes separately first.
  2. use qemu+gdb to see where we hang.
nathanchance commented 4 years ago
  1. use qemu+gdb to see where we hang.

I was going to try this today but I cannot get the GDB scripts to load, am I doing something wrong?

% curl -LSs https://raw.githubusercontent.com/ClangBuiltLinux/continuous-integration/i386/patches/llvm-all/linux/i386/i386-percpu.patch | git apply -v
Checking patch arch/x86/include/asm/percpu.h...
Applied patch arch/x86/include/asm/percpu.h cleanly.

% echo "CONFIG_DEBUG_INFO=y\nCONFIG_GDB_SCRIPTS=y" >> arch/x86/configs/i386_defconfig

% make -j$(nproc) -s \
ARCH=i386 \
CC=clang \
O=out \
distclean defconfig bzImage

% qemu-system-i386 \
-m 512m \
-drive file=${HOME}/cbl/git/ci/images/i386/rootfs.ext4,format=raw,if=ide \
-append 'console=ttyS0 root=/dev/sda' \
-display none \
-serial mon:stdio \
-kernel ${HOME}/src/linux/out/arch/i386/boot/bzImage \
-s \
-S

% echo "add-auto-load-safe-path ${PWD}/scripts/gdb/vmlinux-gdb.py" >> ~/.gdbinit

% gdb out/vmlinux
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from out/vmlinux...
(gdb) target remote :1234
Remote debugging using :1234
0x0000fff0 in ?? ()
(gdb) lx-version
Undefined command: "lx-version".  Try "help".
(gdb) apropos lx
(gdb) 
tpimh commented 4 years ago

Exactly the same problem here. I was trying to get it working for a while, tried everything I could think of, but no luck, apropos lx always showed nothing. Created a couple forum threads about my problem, but got zero responses. Really started to think that this is just my problem, and I am doing something wrong. If it's the case, at least I'm not alone now.

The only useful piece information I got with gdb is that the kernel panics with "Attempted to kill the idle task!"

nathanchance commented 4 years ago

I might post on the mailing list tomorrow as I have tried everything and still cannot get it to work.

nickdesaulniers commented 4 years ago

Are those the only configs that would be set if gdb scripts were enabled via menuconfig? I remember fixing gdb scripts in the kernel once, but IIRC, there was a nice error message in GDB when the scripts failed to load that made the error obvious. Silent failure seems different than what I recall.

nathanchance commented 4 years ago

Yes, GDB_SCRIPTS only depends on DEBUG_INFO and I see both configs get enabled in my final config. I've tried looking into generic ways to figure out why GDB scripts don't get loaded but I didn't really find anything that was relevant to this situation.

nickdesaulniers commented 4 years ago

Do they load for i386 for older LTS kernels? If we know they once worked, we can bisect the kernel sources to find when they regressed. It's been my experience that upstream breaks these scripts without people noticing.

nathanchance commented 4 years ago

Hmmm, don't know why I didn't think of testing that :( thank you for that.

# bad: [9e98c678c2d6ae3a17cb2de55d17f69dddaa231b] Linux 5.1-rc1
# good: [1c163f4c7b3f621efff9b28a47abb36f7378d783] Linux 5.0
git bisect start 'v5.1-rc1' 'v5.0'
# good: [e266ca36da7de45b64b05698e98e04b578a88888] Merge tag 'staging-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good e266ca36da7de45b64b05698e98e04b578a88888
# good: [36011ddc78395b59a8a418c37f20bcc18828f1ef] Merge tag 'gfs2-5.1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
git bisect good 36011ddc78395b59a8a418c37f20bcc18828f1ef
# bad: [6bc3fe8e7e172d5584e529a04cf9eec946428768] tools: mark 'test_vmalloc.sh' executable
git bisect bad 6bc3fe8e7e172d5584e529a04cf9eec946428768
# good: [a50243b1ddcdd766d0d17fbfeeb1a22e62fdc461] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
git bisect good a50243b1ddcdd766d0d17fbfeeb1a22e62fdc461
# good: [b7a7d1c1ec688104fdc922568c26395a756f616d] Merge tag 'dma-mapping-5.1' of git://git.infradead.org/users/hch/dma-mapping
git bisect good b7a7d1c1ec688104fdc922568c26395a756f616d
# good: [12ad143e1b803e541e48b8ba40f550250259ecdd] Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 12ad143e1b803e541e48b8ba40f550250259ecdd
# bad: [ffd602eb4693bbb49b301fa059b109bbdebf9524] Merge tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
git bisect bad ffd602eb4693bbb49b301fa059b109bbdebf9524
# good: [5af7f115886f7ec193171e2e49b8000ddd1e7147] Merge branch 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
git bisect good 5af7f115886f7ec193171e2e49b8000ddd1e7147
# good: [01d509a48b467fa6e03d4af43b84bce835af4cef] kbuild: remove unimportant comments from ./Kbuild
git bisect good 01d509a48b467fa6e03d4af43b84bce835af4cef
# bad: [6b12de69ad82ceed317bdbae9ede1a256f84bf4e] kbuild: simplify single target rules
git bisect bad 6b12de69ad82ceed317bdbae9ede1a256f84bf4e
# bad: [bd55f96fa9fc29702ec30d75a4290bdadb00209d] kbuild: refactor cc-cross-prefix implementation
git bisect bad bd55f96fa9fc29702ec30d75a4290bdadb00209d
# bad: [8d2e52003adf45035bc6e94056c68dacf517236b] kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
git bisect bad 8d2e52003adf45035bc6e94056c68dacf517236b
# good: [1e5ff84ffe0b09f866761c441003c27ca7e1c6b3] scripts/gdb: do not descend into scripts/gdb from scripts
git bisect good 1e5ff84ffe0b09f866761c441003c27ca7e1c6b3
# first bad commit: [8d2e52003adf45035bc6e94056c68dacf517236b] kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target

Turns out this is more of an issue with how I build the kernel. I have gotten into the habit of just specifying the direct target that I need to avoid unnecessary compilation time. This last commit moved the symlink command from the vmlinux target (which always executes when building a compressed image) to scripts_gdb, which I need to specify separately.

make -j$(nproc) -s \
ARCH=i386 \
CC=clang \
O=out \
distclean defconfig bzImage scripts_gdb

Now that that is out of the way...

nokaslr needs to be added to the kernel command line to get a proper stack trace otherwise this happens:

(gdb) lx-dmesg
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0xad53b27c:
Error occurred in Python: Cannot access memory at address 0xad53b27c

Looks like it is something with jump labels:

(gdb) lx-dmesg
[    0.000000] Linux version 5.4.0-rc1+ (nathan@archlinux-threadripper) (ClangBuiltLinux clang version 10.0.0 (git://github.com/llvm/llvm-project 34f9e98aaecd1dbe58c255119d69b83e1019d7c1) (based on LLVM 10.0.0svn)) #1 SMP Mon Sep 30 19:24:20 MST 2019
[    0.000000] x86/fpu: x87 FPU will use FXSAVE
[    0.000000] BUG: unable to handle page fault for address: 7da8e734
[    0.000000] #PF: supervisor read access in kernel mode
[    0.000000] #PF: error_code(0x0000) - not-present page
[    0.000000] *pde = 00000000 
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc1+ #1
[    0.000000] EIP: jump_label_cmp+0x5/0x50
[    0.000000] Code: 70 04 01 cf 01 cb 8b 72 08 29 ce 03 48 08 89 70 08 89 3a 89 5a 04 89 4a 08 5e 5f 5b 5d c3 8d b4 26 00 00 00 00 55 89 e5 57 56 <8b> 48 08 83 e1 fc 8d 74 01 08 8b 4a 08 83 e1 fc 8d 7c 11 08 b9 ff
[    0.000000] EAX: 7da8e72c EBX: bbde3980 ECX: c1151650 EDX: 7da8e738
[    0.000000] ESI: bbde3974 EDI: 00000000 EBP: c1d13ed0 ESP: c1d13ec8
[    0.000000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210046
[    0.000000] CR0: 80050033 CR2: 7da8e734 CR3: 01ec8000 CR4: 00000600
[    0.000000] Call Trace:
[    0.000000]  sort_r+0x1c5/0x370
[    0.000000]  ? jump_label_text_reserved+0xb0/0xb0
[    0.000000]  sort+0x10/0x20
[    0.000000]  ? jump_label_text_reserved+0xb0/0xb0
[    0.000000]  ? jump_label_swap+0x40/0x40
[    0.000000]  jump_label_init+0x50/0xef
[    0.000000]  ? jump_label_swap+0x40/0x40
[    0.000000]  ? jump_label_text_reserved+0xb0/0xb0
[    0.000000]  setup_arch+0xef/0x5f1
[    0.000000]  ? vprintk_func+0x90/0xa0
[    0.000000]  ? printk+0x1e/0x40
[    0.000000]  start_kernel+0x5d/0x363
[    0.000000]  i386_start_kernel+0x20f/0x211
[    0.000000]  startup_32_smp+0x164/0x168
[    0.000000] Modules linked in:
[    0.000000] CR2: 000000007da8e734
[    0.000000] random: get_random_bytes called from oops_exit+0x3b/0x70 with crng_init=0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] EIP: jump_label_cmp+0x5/0x50
[    0.000000] Code: 70 04 01 cf 01 cb 8b 72 08 29 ce 03 48 08 89 70 08 89 3a 89 5a 04 89 4a 08 5e 5f 5b 5d c3 8d b4 26 00 00 00 00 55 89 e5 57 56 <8b> 48 08 83 e1 fc 8d 74 01 08 8b 4a 08 83 e1 fc 8d 7c 11 08 b9 ff
[    0.000000] EAX: 7da8e72c EBX: bbde3980 ECX: c1151650 EDX: 7da8e738
[    0.000000] ESI: bbde3974 EDI: 00000000 EBP: c1d13ed0 ESP: c1d13ec8
[    0.000000] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210046
[    0.000000] CR0: 80050033 CR2: 7da8e734 CR3: 01ec8000 CR4: 00000600
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
tpimh commented 4 years ago

Nice find! I was able to start gdb and the script was loaded properly.

tpimh commented 4 years ago

Here is the last working build for this branch. There were a lot of commits in the kernel since then and new clang major version release.

nathanchance commented 4 years ago

Given that we have a known issue with jump labels and i386 (https://github.com/ClangBuiltLinux/linux/issues/726), it probably isn’t worth bisecting but I will still give it a go today.

nathanchance commented 4 years ago
# bad: [6aa75a25bdeea9cdc4b04cdd91e82e680444bf4b] Merging r367215: ------------------------------------------------------------------------ r367215 | hans | 2019-07-29 11:49:04 +0200 (Mon, 29 Jul 2019) | 66 lines
# good: [86e4d7ea35eee5329e960d20dedda02ff9968aad] [lldb] [lldbsuite] Use a unique class name for TestValueVarUpdate
git bisect start 'llvmorg-9.0.0-rc1' 'llvmorg-9.0.0-rc1~1570'
# good: [e595a2c9644cd70f506a968acec1ea9b6dafa5e6] GlobalISel: Define the full family of FP min/max instructions
git bisect good e595a2c9644cd70f506a968acec1ea9b6dafa5e6
# good: [0bf0b8ff7c7edcad0f79e4c39dddd58bc0d62a72] [libFuzzer] Disable fork.test on AArch64
git bisect good 0bf0b8ff7c7edcad0f79e4c39dddd58bc0d62a72
# bad: [588fc9e756d3c9981cf7b17f18bd199e7bcd4172] [NFC][ScopBuilder] Move buildAliasChecks and its implementing methods to ScopBuilder
git bisect bad 588fc9e756d3c9981cf7b17f18bd199e7bcd4172
# bad: [bf20b2ace68d300665cf920050fda50003bd1096] Temporarily revert "add -fthinlto-index= option to clang-cl"
git bisect bad bf20b2ace68d300665cf920050fda50003bd1096
# good: [dc56995c57451368b4049738d4a56fa042db7a6e] [ARM] MVE vector for 64bit types
git bisect good dc56995c57451368b4049738d4a56fa042db7a6e
# good: [b082f1055b0a5370d1902339ffe058b4abb6abc0] AMDGPU: Use standalone MUBUF load patterns
git bisect good b082f1055b0a5370d1902339ffe058b4abb6abc0
# good: [c9e3c8301446f20efef6721dd3a05f2f9da217d8] Revert [llvm-lipo] Implement -create (with hardcoded alignments)
git bisect good c9e3c8301446f20efef6721dd3a05f2f9da217d8
# good: [c48162db994ab6040c45d468ea95772b574ab3ef] [TSan] Fix asm token error (again)
git bisect good c48162db994ab6040c45d468ea95772b574ab3ef
# bad: [bb147aabc68c366cff4ac5f1713b7b138a3b0fe0] Revert "[NewPM] Port Sancov"
git bisect bad bb147aabc68c366cff4ac5f1713b7b138a3b0fe0
# bad: [60a0d49e77cf6583b749ad6189751cd5d31bf3ee] [DirectoryWatcher][linux] Fix for older kernels
git bisect bad 60a0d49e77cf6583b749ad6189751cd5d31bf3ee
# bad: [51193871dafd99e79d7d19f62cffbdcdda238530] [X86] Teach convertToThreeAddress to handle SUB with immediate
git bisect bad 51193871dafd99e79d7d19f62cffbdcdda238530
# first bad commit: [51193871dafd99e79d7d19f62cffbdcdda238530] [X86] Teach convertToThreeAddress to handle SUB with immediate

https://github.com/llvm/llvm-project/commit/51193871dafd99e79d7d19f62cffbdcdda238530

With that commit reverted (diff), I can build and boot v5.4-rc1.

Linux version 5.4.0-rc1+ (nathan@archlinux-threadripper) (ClangBuiltLinux clang version 10.0.0 (git://github.com/llvm/llvm-project 0e3f659137189abac6f732b6a576d5c0e2db8383) (based on LLVM 10.0.0svn)) #1 SMP Tue Oct 1 11:42:27 MST 2019

Here is the diff of kernel/jump_label.o between the two files with and without that commit: https://gist.github.com/891f4469a53f9ae983d374e470858864

nickdesaulniers commented 4 years ago

Sounds like https://github.com/ClangBuiltLinux/linux/issues/726. Time to rekick CI?

nathanchance commented 4 years ago

We'll still run into issues because our Docker's LLVM build is at r373388 (latest available according to apt.llvm.org) and the fix is r373397. Should be available next apt.llvm.org rebuild.

tpgxyz commented 4 years ago

I've managed to get kernel compiled with LLVM-9.0.0 in i686 https://abf.openmandriva.org/build_lists/612950

with patches from that https://github.com/ClangBuiltLinux/linux/issues/726 and this PR.

tpimh commented 4 years ago

For some reason it still doesn't seem to boot on travis.

nathanchance commented 4 years ago

Weird since it boots locally for me here...

tpimh commented 4 years ago

@nathanchance Are you running the same docker image locally? For me the output is identical to travis. What I did is:

nathanchance commented 4 years ago

@tpimh I am not using Docker, I've been using my local environment.

It appears to be something with your ld.lld patch.

$ ./driver.sh ARCH=i386
<hangs>
$ rm patches/llvm-all/linux/i386/lld-relocatable-notext.patch
$ ./driver.sh ARCH=i386
<works>
tpimh commented 4 years ago

Just tested, there is a problem with this patch, confirmed.

nickdesaulniers commented 4 years ago

yeah, so looks like we can boot with only 4 small hunks. I also need:

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 6f112d8f80ca..8c2257437471 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -22,6 +22,7 @@ subdir-ccflags-y += $(call cc-disable-warning, sign-compare)
 subdir-ccflags-y += $(call cc-disable-warning, sometimes-uninitialized)
 subdir-ccflags-y += $(call cc-disable-warning, initializer-overrides)
 subdir-ccflags-y += $(call cc-disable-warning, uninitialized)
+subdir-ccflags-y += $(call cc-disable-warning, frame-address)
 subdir-ccflags-$(CONFIG_DRM_I915_WERROR) += -Werror

 # Fine grained warnings disable

to re-disable a bunch of warnings for i915. @adelva1984 wants this for 32b x86 cuttlefish; Android will live with the out of tree patches. Linus has also emailed me about this, and may be amenable to such patches to the assembly. Let's get this enabled ASAP without LLD, then work on enabling LLD.

Sent that along: https://lore.kernel.org/lkml/20200426214215.139435-1-ndesaulniers@google.com/T/#u

tpimh commented 4 years ago

I feel I need to send the patch for ClangBuiltLinux/linux#579 ASAP.

tpimh commented 4 years ago

@nickdesaulniers which version of fix for invalid output size for constraint '=q' do you think is better: https://github.com/ClangBuiltLinux/linux/issues/194#issuecomment-548710288?

nickdesaulniers commented 4 years ago

which version of fix for invalid output size for constraint '=q' do you think is better: ClangBuiltLinux/linux#194 (comment)?

https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints says for q:

Any register accessible as rl. In 32-bit mode, a, b, c, and d; in 64-bit mode, any integer register.

so for inputs I prefer @dwmw2's approach of casting to (unsigned char) which is truncation. I don't understand why there's a cast to (unsigned long) though, which will zero extend. I need to look more at the sources (the diff doesn't have all the info I need to make a more informed decision). We already know the size, so I think we could just cast to the unsigned integral type of the equivalent byte size.

For outputs, I also prefer @dwmw2's approach of using a temporary unsigned char as output, though I again don't think the cast to (unsigned long) is necessary.

Though @arndb is cleaner, and the b suffix on the assembler mnemonics should use the rl register aliases regardless of the q vs r constraint. It would be good to check whether q changes the disassembly for gcc. I suspect it won't but I could be wrong.

Either way, we could start a thread with @arndb , @dwmw2, and Linus, since Linus emailed me about this case recently, and understands the point @dwmw2 makes in his commit message.

tpimh commented 4 years ago

Building with no LLD: Build #1397. All should be fine.

Should I rename the ARCH from i386 to x86 to match the new images in boot-utils?

nickdesaulniers commented 4 years ago

yes please. Originally, I had the images as i386, but the kernel image gets produced in arch/x86/boot/, so it was simpler to just use x86 everywhere.

nickdesaulniers commented 4 years ago

Boot failures on -next look like:

Could not access KVM kernel module: No such file or directory
qemu-system-i386: failed to initialize KVM: No such file or directory

hmmm

Also, patches aren't applying cleanly to mainline.

nathanchance commented 4 years ago

Boot failures on -next look like:


Could not access KVM kernel module: No such file or directory

qemu-system-i386: failed to initialize KVM: No such file or directory

hmmm

Fixed by https://github.com/ClangBuiltLinux/boot-utils/pull/12/.

nickdesaulniers commented 4 years ago

I've rekicked the tests.

nickdesaulniers commented 4 years ago

Nice work @tpimh ! Thanks

nickdesaulniers commented 4 years ago

I plan to land https://reviews.llvm.org/D79804 which is an improvement to Clang, but will make the patches we're carrying no longer work.