CachyOS / kernel-patches

Custom Linux kernel patches
140 stars 20 forks source link

LLVM 18 - linking vmlimuz.o failed #48

Closed MrDuartePT closed 7 months ago

MrDuartePT commented 8 months ago

Using your latest patches in the latest stable (6.7.6) and 6.7.5 when using ld.lld as the linker. vmzlinux fails:

 LD      vmlinux.o
ld.lld: /var/tmp/portage/sys-devel/llvm-18.1.0_rc3/work/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:9845: SDValue llvm::SelectionDAG::getNode(unsigned int, const SDLoc &, SDVTList, ArrayRef<SDValue>, const SDNodeFlags): Assertion `Op.getOpcode() != ISD::DELETED_NODE && "Operand is DELETED_NODE!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: ld.lld -m elf_x86_64 -mllvm -import-instr-limit=5 -z noexecstack -r -o vmlinux.o -T .tmp_initcalls.lds --whole-archive vmlinux.a --no-whole-archive --start-group --end-group
1.  Running pass 'Function Pass Manager' on module 'ld-temp.o'.
2.  Running pass 'X86 DAG->DAG Instruction Selection' on function '@ip6_rcv_core'
 #0 0x00007f0171a0a7f6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x480a7f6)
 #1 0x00007f0171a07c20 llvm::sys::RunSignalHandlers() (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x4807c20)
 #2 0x00007f0171a0b03e (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x480b03e)
 #3 0x00007f016cf60d30 (/usr/lib64/libc.so.6+0x3bd30)
 #4 0x00007f016cfb118c (/usr/lib64/libc.so.6+0x8c18c)
 #5 0x00007f016cf60c82 raise (/usr/lib64/libc.so.6+0x3bc82)
 #6 0x00007f016cf494ed abort (/usr/lib64/libc.so.6+0x244ed)
 #7 0x00007f016cf49415 (/usr/lib64/libc.so.6+0x24415)
 #8 0x00007f016cf595d2 (/usr/lib64/libc.so.6+0x345d2)
 #9 0x00007f01727dde7c llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc const&, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>, llvm::SDNodeFlags) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x55dde7c)
#10 0x00007f01728113c1 llvm::SelectionDAGISel::Select_INLINEASM(llvm::SDNode*) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x56113c1)
#11 0x00007f01762b3662 (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x90b3662)
#12 0x00007f017280ad18 llvm::SelectionDAGISel::DoInstructionSelection() (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x560ad18)
#13 0x00007f01728093ab llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x56093ab)
#14 0x00007f0172805abf llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x5605abf)
#15 0x00007f0172801ce6 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x5601ce6)
#16 0x00007f01762a580e (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x90a580e)
#17 0x00007f01720511d9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x4e511d9)
#18 0x00007f0171c15b2b llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x4a15b2b)
#19 0x00007f0171c20652 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x4a20652)
#20 0x00007f0171c1683f llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x4a1683f)
#21 0x00007f0173e35963 (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x6c35963)
#22 0x00007f0173e3467c llvm::lto::backend(llvm::lto::Config const&, std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x6c3467c)
#23 0x00007f0173e23193 llvm::lto::LTO::runRegularLTO(std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x6c23193)
#24 0x00007f0173e21fd1 llvm::lto::LTO::run(std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, std::__1::function<llvm::Expected<std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>> (unsigned int, llvm::StringRef, llvm::Twine const&)>) (/usr/lib/llvm/18/bin/../lib64/libLLVM.so.18.1+libcxx+0x6c21fd1)
#25 0x00007f0177417b40 lld::elf::BitcodeCompiler::compile() (/usr/lib/llvm/18/bin/../lib64/liblldELF.so.18.1+libcxx+0x217b40)
#26 0x00007f017737410a lld::elf::LinkerDriver::link(llvm::opt::InputArgList&) (/usr/lib/llvm/18/bin/../lib64/liblldELF.so.18.1+libcxx+0x17410a)
#27 0x00007f017735e2dd lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (/usr/lib/llvm/18/bin/../lib64/liblldELF.so.18.1+libcxx+0x15e2dd)
#28 0x00007f017735c870 lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (/usr/lib/llvm/18/bin/../lib64/liblldELF.so.18.1+libcxx+0x15c870)
#29 0x00007f0176fc53ee lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) (/usr/lib/llvm/18/bin/../lib64/liblldCommon.so.18.1+libcxx+0x133ee)
#30 0x000055734dd0b428 lld_main(int, char**, llvm::ToolContext const&) (/usr/lib/llvm/18/bin/ld.lld+0x4428)
#31 0x000055734dd0bd9e main (/usr/lib/llvm/18/bin/ld.lld+0x4d9e)
#32 0x00007f016cf4aeea (/usr/lib64/libc.so.6+0x25eea)
#33 0x00007f016cf4afa5 __libc_start_main (/usr/lib64/libc.so.6+0x25fa5)
#34 0x000055734dd0b1f1 _start (/usr/lib/llvm/18/bin/ld.lld+0x41f1)
make[2]: *** [scripts/Makefile.vmlinux_o:62: vmlinux.o] Error 134
make[1]: *** [/usr/src/linux-6.7.6-gentoo/Makefile:1293: vmlinux_o] Error 2
make: *** [Makefile:234: __sub-make] Error 2
Install Modules
  SYMLINK /lib/modules/6.7.6-gentoo-x86_64/build
  INSTALL /lib/modules/6.7.6-gentoo-x86_64/modules.order
make[2]: *** No rule to make target 'modules.builtin', needed by '/lib/modules/6.7.6-gentoo-x86_64/modules.builtin'.  Stop.
make[1]: *** [/usr/src/linux-6.7.6-gentoo/Makefile:1977: modules_install] Error 2
make: *** [Makefile:234: __sub-make] Error 2

If you need more info you also have this issue: https://github.com/llvm/llvm-project/issues/82896

ptr1337 commented 8 months ago

Hi,

thanks for the report. Could you maybe bisect the patches? I also started a compilation with full lto and llvm 18 rc4 to see if this issue is present.

I'd suspect the bbr3 patch, but lets see.

On Arch we are currently not in rc3, so i can only test it right now in a docker container with our rc repos enabled.

MrDuartePT commented 8 months ago

Hi,

thanks for the report. Could you maybe bisect the patches?

I also started a compilation with full lto and llvm 18 rc4 to see if this issue is present.

I'd suspect the bbr3 patch, but lets see.

On Arch we are currently not in rc3, so i can only test it right now in a docker container with our rc repos enabled.

No problem I can update to rc4 I also we tried to bisect the patches.

ptr1337 commented 8 months ago

Reverting the "cachy" patchset does fix the issue. https://github.com/CachyOS/kernel-patches/blob/master/6.7/0004-cachy.patch

Ill try fruther to bisect it.

ptr1337 commented 8 months ago

Here is the branch of the commits in the "cachy" patchset: https://github.com/CachyOS/linux/commits/6.7/cachy/

I maybe suspect, that this comes from -march=xyz, but still on bisecting.

MrDuartePT commented 8 months ago

Just to be sure the patch applied by all is only the ones in the main 6.7 folder right? I gonna try to applied everthing execpt Cachy one!

MrDuartePT commented 8 months ago

I compiling the kernel without 0004-cachy.patch let see if dosen't fail on LLVM 18-rc3 but it shouldn't

ptr1337 commented 8 months ago

Narrowed it down to these commits:

.rw-r--r--  17k ptr1337 28 Feb 20:05  0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch
.rw-r--r--  25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch
.rw-r--r-- 185k ptr1337 28 Feb 20:05  0003-Revert-Cachy-Add-legion-laptop-v0.0.9.patch
.rw-r--r-- 4,4k ptr1337 28 Feb 20:05  0004-Revert-Cachy-Add-ACS-override-support.patch
.rw-r--r--  18k ptr1337 28 Feb 20:05  0005-Revert-Cachy-Add-OpenRGB-patches.patch
MrDuartePT commented 8 months ago

Narrowed it down to these commits:

.rw-r--r--  17k ptr1337 28 Feb 20:05  0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch
.rw-r--r--  25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch
.rw-r--r-- 185k ptr1337 28 Feb 20:05  0003-Revert-Cachy-Add-legion-laptop-v0.0.9.patch
.rw-r--r-- 4,4k ptr1337 28 Feb 20:05  0004-Revert-Cachy-Add-ACS-override-support.patch
.rw-r--r--  18k ptr1337 28 Feb 20:05  0005-Revert-Cachy-Add-OpenRGB-patches.patch

I dont think legion-laptop is the culprint since the kernel module compile fine in LLVM-18 with LTO

ptr1337 commented 8 months ago

Narrowed it down to these commits:

.rw-r--r--  17k ptr1337 28 Feb 20:05  0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch
.rw-r--r--  25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch
.rw-r--r-- 185k ptr1337 28 Feb 20:05  0003-Revert-Cachy-Add-legion-laptop-v0.0.9.patch
.rw-r--r-- 4,4k ptr1337 28 Feb 20:05  0004-Revert-Cachy-Add-ACS-override-support.patch
.rw-r--r--  18k ptr1337 28 Feb 20:05  0005-Revert-Cachy-Add-OpenRGB-patches.patch

I dont think legion-laptop is the culprint since the kernel module compile fine in LLVM-18 with LTO

Personally I suspect the Additonal CPU Opt and Makefile Patch. Did you change the config to use a different MARCH? Im not sure, how the gentoo ebuild is setuped currently.

MrDuartePT commented 8 months ago

You can share that patches to help testing?!

ptr1337 commented 8 months ago

You can share that patches to help testing?!

There you go. I do right now a compilation with the 0001 and 0002 revert. Archive.tar.gz

MrDuartePT commented 8 months ago

Narrowed it down to these commits:

.rw-r--r--  17k ptr1337 28 Feb 20:05  0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch
.rw-r--r--  25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch
.rw-r--r-- 185k ptr1337 28 Feb 20:05  0003-Revert-Cachy-Add-legion-laptop-v0.0.9.patch
.rw-r--r-- 4,4k ptr1337 28 Feb 20:05  0004-Revert-Cachy-Add-ACS-override-support.patch
.rw-r--r--  18k ptr1337 28 Feb 20:05  0005-Revert-Cachy-Add-OpenRGB-patches.patch

I dont think legion-laptop is the culprint since the kernel module compile fine in LLVM-18 with LTO

Personally I suspect the Additonal CPU Opt and Makefile Patch. Did you change the config to use a different MARCH? Im not sure, how the gentoo ebuild is setuped currently.

I using march=native for my build, since I have a AMD Ryzen 7 5800H (Zen3) that will be: -march=znver3 -mno-pku -mshstk --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=512 I was also specify Amd Zen3 in the CPU optimization in the kernel

ptr1337 commented 8 months ago

@MrDuartePT Does it get apply via KCFLAGS or the Kernel config?

MrDuartePT commented 8 months ago

@MrDuartePT Does it get apply via KCFLAGS or the Kernel config?

Yes it gets. And even if dosent when I use your Cachy patch it would get because of CONFIG_MZEN3=y

MrDuartePT commented 8 months ago

Ok LD didn't fail without Cachy patch. I gonna try the revert also.

MrDuartePT commented 8 months ago

Trying without 0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch

ptr1337 commented 8 months ago

.rw-r--r-- 17k ptr1337 28 Feb 20:05  0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch .rw-r--r-- 25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch

.rw-r--r--  17k ptr1337 28 Feb 20:05  0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch
.rw-r--r--  25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch

compiled also, not much left :D

MrDuartePT commented 8 months ago

Yup let see with only 0001 reverted, mine is talking a bit a compiling kwin also :D

ptr1337 commented 8 months ago

Yup let see with only 0001 reverted, mine is talking a bit a compiling kwin also :D

At me it will also take a while, the buildserver is compiling now 2x firefox-developer-edition :x

MrDuartePT commented 8 months ago

Trying without 0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch

Only reverting this the linking fail. Lets see only reverting: .rw-r--r-- 25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch

ptr1337 commented 8 months ago

Trying without 0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch

Only reverting this the linking fail. Lets see only reverting: .rw-r--r-- 25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch

I think this is an LLVM issue then. Also were able to reproduce it with only that revert.

ptr1337 commented 8 months ago

@MrDuartePT I will try a compilation with the upstream cpu march patch from graysky, and will come back as soon i know if its also causing that. Otherwise I would suggest to forward this to the lld/llvm developers.

MrDuartePT commented 8 months ago

Trying without 0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch

Only reverting this the linking fail. Lets see only reverting: .rw-r--r-- 25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch

I think this is an LLVM issue then. Also were able to reproduce it with only that revert.

Well it dosen't seem to have failed on my side with 0002 revert but I gonna retry it with a fresh copy of the kernel to be sure

ptr1337 commented 8 months ago

Alright, it appears to be https://github.com/graysky2/kernel_compiler_patch/blob/master/more-uarches-for-kernel-6.1.79-6.8-rc3.patch , which is the same as https://github.com/CachyOS/linux/commit/6ec793e12dee914d26761bec11cbdb6df4a9ea1e

Ill also write into the Issue from llvm

MrDuartePT commented 8 months ago

Trying without 0001-Revert-Cachy-Makefile-Move-ARM-and-x86-instruction-s.patch

Only reverting this the linking fail. Lets see only reverting: .rw-r--r-- 25k ptr1337 28 Feb 20:05  0002-Revert-Cachy-Additional-CPU-Optimization-Options.patch

I think this is an LLVM issue then. Also were able to reproduce it with only that revert.

Well it dosen't seem to have failed on my side with 0002 revert but I gonna retry it with a fresh copy of the kernel to be sure

Well with 0002 reverted work fine so problematic commit is this one: https://github.com/CachyOS/linux/commit/6ec793e12dee914d26761bec11cbdb6df4a9ea1e

ptr1337 commented 8 months ago

Yes, I have also commented about our findings. Im trying now an additonal compilation with ThinLTO and Clang without LTO, to see if it is also affected.

MrDuartePT commented 8 months ago

Ok I can also try later just gonna wait to plasma-desktop to finish compiling

ptr1337 commented 8 months ago

FYI, ThinLTO and no LTO + Clang also fail (+march=znver4). Without LTO it does crash way earlier. Will add Logs from no LTO + Clang to llvm

ptr1337 commented 8 months ago

@MrDuartePT Which CPU March do you set? Zen4?

ptr1337 commented 8 months ago

Just adding KCFLAGS makes it possible to reproduce the issue:

    export KCFLAGS=' -march=znver4 -mtune=znver4'
    export KCPPFLAGS=' -march=znver4 -mtune=znver4'
ptr1337 commented 7 months ago

upstream issue. Closing here.