Frogging-Family / nvidia-all

Nvidia driver latest to 396 series AIO installer
788 stars 69 forks source link

Install Error with Linux-tkg Kernel #36

Open ThisNekoGuy opened 3 years ago

ThisNekoGuy commented 3 years ago

I compiled a Linux-tkg kernel with the below configuration:

customization.cfg Zip File

And, attempting to install an Nvidia driver gives me this error for some reason:

DKMS make.log for nvidia-465.24.02 for kernel 5.11.15-148-tkg-upds-llvm (x86_64)
Thu Apr 22 01:14:21 AM CDT 2021
make[1]: Entering directory '/usr/lib/modules/5.11.15-148-tkg-upds-llvm/build'
scripts/Makefile.lib:8: 'always' is deprecated. Please use 'always-y' instead
  SYMLINK /var/lib/dkms/nvidia/465.24.02/build/nvidia/nv-kernel.o
  SYMLINK /var/lib/dkms/nvidia/465.24.02/build/nvidia-modeset/nv-modeset-kernel.o
  CC [M]  /var/lib/dkms/nvidia/465.24.02/build/nvidia-peermem/nvidia-ib-peermem-stub.o
cc: error: unrecognized command-line option ‘-Qunused-arguments’
cc: error: unrecognized command-line option ‘-mno-global-merge’
cc: error: unrecognized command-line option ‘-ftrivial-auto-var-init=zero’
make[2]: *** [scripts/Makefile.build:279: /var/lib/dkms/nvidia/465.24.02/build/nvidia-peermem/nvidia-ib-peermem-stub.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile:1820: /var/lib/dkms/nvidia/465.24.02/build] Error 2
make[1]: Leaving directory '/usr/lib/modules/5.11.15-148-tkg-upds-llvm/build'
make: *** [Makefile:80: modules] Error 2

Not sure if I did something wrong?

Tk-Glitch commented 3 years ago

You have used llvm to build your kernel. You need to force dkms to build modules with llvm, which might actually not even work at all with Nvidia. Rebuilding with gcc (default) will fix it.

ThisNekoGuy commented 3 years ago

How would one attempt to do that though, out of curiosity so I can try?

netbospl commented 3 years ago

How would one attempt to do that though, out of curiosity so I can try?

You could potencially build DKMS only version and point your DKMS to use LLVM instead of GCC before installing nvidia DKMS module. Still, I wouldn't recommend this, But can be done to satisfy your curiosity.

flindeberg commented 2 years ago

Latest DKMS has solved this issue.

@ThisNekoGuy Can you verify that it works for you and close the issue?

ThisNekoGuy commented 2 years ago

Wait, it's supported now?? (To be clear, Idk how to change DKMS's behavior, if that's what you're implying I do) Also, does this mean I'd have to compile DKMS for that latest change?

flindeberg commented 2 years ago

DKMS 3.0.2 works out of the box. Which version does your distribution have?

ThisNekoGuy commented 2 years ago

Oh, yeah, it's the same version I didn't know the last time I updated it so I thought it must've been recent or something :p

I'll recompile and let you know

ThisNekoGuy commented 2 years ago

Is it normal to get warning messages about "unreachable instructions" when building the kernel? >_> And this warning:

WARNING: modpost: vmlinux.o(.text+0x96602): Section mismatch in reference from the function __nodes_weight() to the variable .init.data:numa_nodes_parsed
The function __nodes_weight() references
the variable __initdata numa_nodes_parsed.
This is often because __nodes_weight lacks a __initdata 
annotation or the annotation of numa_nodes_parsed is wrong.

(I'm running it in makechrootpkg)

flindeberg commented 2 years ago

As long as its just warnings I wouldn't worry too much, especially "unreachable instruction" as that (usually) only means that there are unreachable code paths (with the current config).

ThisNekoGuy commented 2 years ago

This happened at the end of the build?

==> Creating package "linux514-tkg-pds-llvm-headers"...
  -> Generating .PKGINFO file...
  -> Generating .BUILDINFO file...
  -> Generating .MTREE file...
  -> Compressing package...
==> Leaving fakeroot environment.
  -> exit cleanup done

==> Finished making: linux514-tkg-pds-llvm 5.14.21-227 (Thu 09 Dec 2021 02:20:10 AM CST)
  -> exit cleanup done

  -> compilation time : 

real    28m53.512s
user    315m20.652s
sys     30m7.069s
PKGBUILD: line 17: plain: command not found
PKGBUILD: line 18: plain: command not found
PKGBUILD: line 19: plain: command not found
PKGBUILD: line 20: plain: command not found
PKGBUILD: line 21: plain: command not found
PKGBUILD: line 22: plain: command not found
PKGBUILD: line 23: plain: command not found
PKGBUILD: line 24: plain: command not found
PKGBUILD: line 25: plain: command not found
PKGBUILD: line 26: plain: command not found
PKGBUILD: line 27: plain: command not found
PKGBUILD: line 28: plain: command not found
PKGBUILD: line 29: plain: command not found
PKGBUILD: line 30: plain: command not found
PKGBUILD: line 31: plain: command not found
PKGBUILD: line 32: plain: command not found
PKGBUILD: line 33: plain: command not found
PKGBUILD: line 34: plain: command not found
PKGBUILD: line 44: msg2: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 228: msg2: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 195: plain: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 196: warning: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 58: plain: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 60: plain: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 60: plain: command not found
[0-2]: 0
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 79: plain: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 1497: remove_deps: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 1497: msg2: command not found
/mnt/extraStorage/linux-tkg/linux-tkg-config/prepare: line 1500: msg2: command not found

Doesn't this normally not happen?

flindeberg commented 2 years ago

Have you modified the build-files for linux-tkg? msg2, plainand warning are defined in https://github.com/Frogging-Family/linux-tkg/blob/6be7a4920b868a046ebb4af320c6f76c56dc0412/install.sh#L13. Both PKGBUILD and prepare should have loaded them already. Did you run or of memory?

To get the compile time down you could have a look at modprobed-db (https://wiki.archlinux.org/title/Modprobed-db), you should be able to compile the kernel in 5 minutes or so if you only compile the modules you need.

ThisNekoGuy commented 2 years ago

I have 32 GBs of RAM and about 35 GBs of SWAP space, so I doubt I ran out of memory

I didn't touch any of the build files at all except for the PKGBUILD to add these lines near the top because I didn't want them to be inherited from my makepkg.conf:

CFLAGS="${CFLAGS/-march=native/}"
CXXFLAGS="${CXXFLAGS/-march=native/}"
CFLAGS="${CFLAGS/-mtune=native/}"
CXXFLAGS="${CXXFLAGS/-mtune=native/}"
RUSTFLAGS="${RUSTFLAGS/-C target-cpu=native/}"
RUSTFLAGS="${RUSTFLAGS/-Ctarget-cpu=native/}"

Other than that, nothing else was touched at all. Also, I didn't use _menuconfig:

# Set to "1" to call make menuconfig, "2" to call make nconfig, "3" to call make xconfig, before building the kernel. Set to false to disable and skip the prompt.
_menunconfig="false"

And I had the config file set to: config.x86_64

For reference, I'm attaching the logs: linux-tkg-5_14_21_227-compile-logs.zip

Also, I've never used modprobed-db before :v

Maybe this was because of makechrootpkg?

ThisNekoGuy commented 2 years ago

I rebuilt the kernel just in case and it seemed to link with nvidia-tkg just fine now

Should we make the makechrootpkg problem a separate issue?

ThisNekoGuy commented 2 years ago

@flindeberg On a sperate note, for bringing to my attention about modprobed-db, by the way I just spent basically all day today and stayed up until 5am last night messing around with it :joy:

I'm new to this stuff, so I don't know how to get all of the modules I collected and researched into the TKG kernel (some modules weren't available for selection) but it was really satisfying seeing it compile so fast and the kernel package drop about ~106MB in size Lol image

flindeberg commented 2 years ago

Should we make the makechrootpkg problem a separate issue?

That makes sense I think.

I'm new to this stuff, so I don't know how to get all of the modules I collected and researched into the TKG kernel (some modules weren't available for selection) but it was really satisfying seeing it compile so fast and the kernel package drop about ~106MB in size

:+1:

Sometimes it is a bit of a mess to find which modules you need though, but overall I find modprobed-db a great utility which makes it a tonne easier to kernel-hop (after you have matured enough by distro-hopping and settled on arch btw, I guess kernel-hopping is the next step?).

I use this list as a base for modprobed-db: https://gist.github.com/flindeberg/4d475eae9f21eb2348998bd2384df95b, feel free to copy it. It is basic drivers + wireguard.

ThisNekoGuy commented 2 years ago

@flindeberg

I tried to update to 470.94 and got a DKMS error with the llvm kernel:

==> dkms install --no-depmod nvidia/470.94 -k 5.14.21-230-tkg-pds-llvm
Error! Bad return status for module build on kernel: 5.14.21-230-tkg-pds-llvm (x86_64)
Consult /var/lib/dkms/nvidia/470.94/build/make.log for more information.

This didn't happen with 470.86 I already had installed prior to building the llvm kernel, after that DKMS update, so I don't quite understand what's going on here :/

If I try to do it manually and check the log:

neko-san@ARCH ~ [1]> sudo dkms install --no-depmod nvidia/470.94 -k 5.14.21-230-tkg-pds-llvm
doas (neko-san@ARCH) password: 

Building module:
cleaning build area...
'make' -j16 IGNORE_PREEMPT_RT_PRESENCE=1 NV_EXCLUDE_BUILD_MODULES='__EXCLUDE_MODULES' KERNEL_UNAME=5.14.21-230-tkg-pds-llvm IGNORE_CC_MISMATCH='__IGNORE_CC_MISMATCH' modules CC=clang LD=ld.lld...(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.14.21-230-tkg-pds-llvm (x86_64)
Consult /var/lib/dkms/nvidia/470.94/build/make.log for more information.
neko-san@ARCH ~ [10]> cat /var/lib/dkms/nvidia/470.94/build/make.log
DKMS make.log for nvidia-470.94 for kernel 5.14.21-230-tkg-pds-llvm (x86_64)
Sun Jan  2 13:36:45 CST 2022
make[1]: Entering directory '/usr/lib/modules/5.14.21-230-tkg-pds-llvm/build'

  ERROR: Kernel configuration is invalid.
         include/generated/autoconf.h or include/config/auto.conf are missing.
         Run 'make oldconfig && make prepare' on kernel src to fix it.

make[1]: *** [Makefile:748: include/config/auto.conf] Error 1
make[1]: Leaving directory '/usr/lib/modules/5.14.21-230-tkg-pds-llvm/build'
make: *** [Makefile:80: modules] Error 2
neko-san@ARCH ~> 

Attempting to re-install 470.86 gave me the same error too. :/

EDIT: Rebuilding and installing the rebuilt kernel fixed it but it seems that it doesn't like when DKMS is built against it after an initial first install of built TKG kernels?? Because I then immediately tried to reinstall the driver and it threw the DKMS error again. (nvidia-all has to already be "installed" on the re-installation of the kernel)