NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.21k stars 1.28k forks source link

nvidia-drm Direct firmware load for nvidia/550.76/gsp_ga10x.bin failed with error -2 #639

Closed knutj closed 5 months ago

knutj commented 6 months ago

NVIDIA Open GPU Kernel Modules Version

550.76

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Description: Fedora release 40 (Forty)

Kernel Release

Linux knut 6.8.9-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 2 18:59:06 UTC 2024 x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-7748274a-4638-0708-093c-ed052c2b4537)

Describe the bug

[ 11.565746] nvidia: loading out-of-tree module taints kernel. [ 11.659723] nvidia-nvlink: Nvlink Core is being initialized, major device number 509 [ 11.660518] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none [ 12.030254] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 550.76 Release Build (akmods@knut) Sat 11 May 06:40:43 CEST 2024 [ 12.035486] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver [ 12.036183] nvidia 0000:01:00.0: Direct firmware load for nvidia/550.76/gsp_ga10x.bin failed with error -2 [ 12.036865] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice [ 12.036926] [drm:nv_drm_register_drm_device [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to register device

To Reproduce

switch to open-gpu-model reboot

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

No response

knutj commented 6 months ago

Resolved

mtijanic commented 5 months ago

"resolved"

Shifter2600 commented 3 months ago

I am hitting this same issue with NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run How is this resolved?

nvidia-driver-runtime-n2f6s:/ # nvidia-smi No devices were found nvidia-driver-runtime-n2f6s:/ # lsmod | grep nvidia nvidia_vgpu_vfio 86016 0 nvidia 8699904 1 nvidia_vgpu_vfio mdev 28672 1 nvidia_vgpu_vfio vfio 45056 3 nvidia_vgpu_vfio,vfio_iommu_type1,mdev drm 634880 7 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,ttm kvm 1056768 2 kvm_amd,nvidia_vgpu_vfio irqbypass 16384 2 nvidia_vgpu_vfio,kvm nvidia-driver-runtime-n2f6s:/ # lspci | grep NVIDIA 41:00.0 VGA compatible controller: NVIDIA Corporation Device 26b2 (rev a1) 41:00.1 Audio device: NVIDIA Corporation Device 22ba (rev a1) nvidia-driver-runtime-n2f6s:/ # dmesg | grep NVIDIA [ 153.856013] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 550.90.05 Release Build (dvs-builder@U16-I1-N08-05-1) Mon May 27 14:37:46 UTC 2024 [ 155.996611] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 550.90.05 Release Build (dvs-builder@U16-I1-N08-05-1) Mon May 27 14:37:46 UTC 2024 nvidia-driver-runtime-n2f6s:/ # dmesg | grep nvidia [ 153.784296] nvidia: loading out-of-tree module taints kernel. [ 153.787846] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 153.808814] nvidia: externally supported module, setting X kernel taint flag. [ 153.810802] nvidia-nvlink: Nvlink Core is being initialized, major device number 511 [ 153.812787] nvidia 0000:41:00.0: enabling device (0000 → 0003) [ 153.812983] nvidia 0000:41:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none [ 153.862748] nvidia_vgpu_vfio: externally supported module, setting X kernel taint flag. [ 153.918226] nvidia-nvlink: Unregistered Nvlink Core, major device number 511 [ 155.942593] nvidia: externally supported module, setting X kernel taint flag. [ 155.945345] nvidia-nvlink: Nvlink Core is being initialized, major device number 511 [ 155.947949] nvidia 0000:41:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none [ 156.001285] nvidia_vgpu_vfio: externally supported module, setting X kernel taint flag. [ 156.146216] nvidia 0000:41:00.0: Direct firmware load for nvidia/550.90.05/gsp_ga10x.bin failed with error -2 [ 156.146990] nvidia 0000:41:00.0: Direct firmware load for nvidia/550.90.05/gsp_ga10x.bin failed with error -2 [ 156.151932] nvidia 0000:41:00.0: Direct firmware load for nvidia/550.90.05/gsp_ga10x.bin failed with error -2 [ 156.152440] nvidia 0000:41:00.0: Direct firmware load for nvidia/550.90.05/gsp_ga10x.bin failed with error -2 [ 241.904348] nvidia 0000:41:00.0: Direct firmware load for nvidia/550.90.05/gsp_ga10x.bin failed with error -2

knutj commented 3 months ago

I made sure to install the right firmware. In my system I have install 560.28.03 in /lib/firmware/nvidia/560.28.03

timur-tabi commented 3 months ago

Except this version of the driver is looking for 550.90.05:

[ 156.146216] nvidia 0000:41:00.0: Direct firmware load for nvidia/550.90.05/gsp_ga10x.bin failed with error -2

You have a bad driver installation. You might have multiple versions, or maybe the old one didn't uninstall properly, or whatever. You need to clean up your system.

Good luck.