dell / dkms

Dynamic Kernel Module Support
GNU General Public License v2.0
636 stars 150 forks source link

Failing to compile nvidia-open-dkms while using a ThinLTO/Clang kernel #416

Open ltsdw opened 4 months ago

ltsdw commented 4 months ago
dkms: 3.0.12-1
clang: 17.0.6-2

As there been some debate over this on TKG and NVIDIA github (like https://github.com/NVIDIA/open-gpu-kernel-modules/issues/405), I'm facing something different, but I don't know whether it's related to dkms or not, for example, when trying to install nvidia-open-dkms on Arch Linux, I see this:

strip: BFD (GNU Binutils) 2.42.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elf.c:4131
strip: BFD (GNU Binutils) 2.42.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elf.c:4131
...
strip: BFD (GNU Binutils) 2.42.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elf.c:4131
strip: BFD (GNU Binutils) 2.42.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elf.c:4131
strip: BFD (GNU Binutils) 2.42.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elf.c:4131
strip: BFD (GNU Binutils) 2.42.0 assertion fail /usr/src/debug/binutils/binutils-gdb/bfd/elf.c:4131
...

(It will keep printing a bunch of those, it will still successfully install the nvidia-open-dkms besides that)

Also trying to boot the system with the borked driver won't work (it will hang on boot).

So I tried replacing the /usr/bin/strip for /usr/bin/llvm-strip, and those assertions disappears and the nvidia-open kernel module installs normally, though it still borked (it will hang on boot)

I think maybe the dkms script isn't setting the right STRIP=/usr/bin/llvm-stip environment variable or other Clang related stuff?

danudey commented 3 months ago

In order to successfully compile a kernel module via DKMS, you need to provide the module build with the same LLVM-related environment variables you used to build the kernel, and as far as I can tell this doesn't happen with DKMS (i.e. there's no way for DKMS to know what those parameters were).

On Ubuntu I had to modify the dkms file in /usr/src/nvidia-<ver>/dkms.conf to add those parameters; for example, to go from this:

MAKE[0]="'make' -j__JOBS NV_EXCLUDE_BUILD_MODULES='__EXCLUDE_MODULES' KERNEL_UNAME=${kernelver} modules"

to this:

MAKE[0]="LLVM=1 'make' -j__JOBS NV_EXCLUDE_BUILD_MODULES='__EXCLUDE_MODULES' KERNEL_UNAME=${kernelver} modules"

I would assume that on Arch you would have to do a similar thing.

(Ubuntu actually uses a different dkms.conf file which needs more work to fix.)

ltsdw commented 3 months ago

While setting the envar on the dkms.conf file would work, I have other kernels which weren't compiled using clang. Like, the dkms script can detect where you compiled your kernel using clang or not:


    # Check if clang was used to compile or lld was used to link the kernel.
    if [[ -e $kernel_source_dir/vmlinux ]]; then
      if  readelf -p .comment $kernel_source_dir/vmlinux | grep -q clang; then
        make_command="${make_command} CC=clang"
      fi
      if  readelf -p .comment $kernel_source_dir/vmlinux | grep -q LLD; then
        make_command="${make_command} LD=ld.lld"
      fi
    elif [[ -e "${kernel_config}" ]]; then
      if grep -q CONFIG_CC_IS_CLANG=y "${kernel_config}"; then
        make_command="${make_command} CC=clang"
      fi
      if grep -q CONFIG_LD_IS_LLD=y "${kernel_config}"; then
        make_command="${make_command} LD=ld.lld"
      fi
    fi

As such without ThinLTO I don't even have to worry about setting envars for clang ou llvm-strip, it compiles normally.

ltsdw commented 3 months ago

This one works for me, although some of the envars may not be needed.

--- a/dkms  2024-05-18 11:52:29.624897615 -0300
+++ b/dkms  2024-05-18 11:52:17.387875553 -0300
@@ -589,17 +589,17 @@
     # Check if clang was used to compile or lld was used to link the kernel.
     if [[ -e $kernel_source_dir/vmlinux ]]; then
       if  readelf -p .comment $kernel_source_dir/vmlinux | grep -q clang; then
-        make_command="${make_command} CC=clang"
+        make_command="${make_command} CC=clang CXX=clang++ AR=llvm-ar OBJCOPY=llvm-objcopy"
       fi
       if  readelf -p .comment $kernel_source_dir/vmlinux | grep -q LLD; then
-        make_command="${make_command} LD=ld.lld"
+        make_command="${make_command} LD=ld.lld AR=llvm-ar OBJCOPY=llvm-objcopy"
       fi
     elif [[ -e "${kernel_config}" ]]; then
       if grep -q CONFIG_CC_IS_CLANG=y "${kernel_config}"; then
-        make_command="${make_command} CC=clang"
+        make_command="${make_command} CC=clang CXX=clang++ AR=llvm-ar OBJCOPY=llvm-objcopy"
       fi
       if grep -q CONFIG_LD_IS_LLD=y "${kernel_config}"; then
-        make_command="${make_command} LD=ld.lld"
+        make_command="${make_command} LD=ld.lld AR=llvm-ar OBJCOPY=llvm-objcopy"
       fi
     fi

@@ -1112,7 +1112,11 @@
         local built_module="$the_module$module_uncompressed_suffix"
         local compressed_module="$the_module$module_suffix"

-        [[ ${strip[$count]} != no ]] && strip -g "$built_module"
+        if [[ ${strip[$count]} != no ]] && [[ ${CC} == "clang" ]]; then
+            llvm-strip -g "$built_module"
+        elif [[ ${strip[$count]} != no ]]; then
+            strip -g "$built_module"
+        fi

         if (( do_signing )); then
             echo "Signing module $built_module"