NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
14.19k stars 1.17k forks source link

Not work on laptop with Ryzen APU #282

Open onlymash opened 2 years ago

onlymash commented 2 years ago

NVIDIA Open GPU Kernel Modules Version

515.48.07

Does this happen with the proprietary driver (of the same version) as well?

No

Operating System and Version

Arch Linux

Kernel Release

5.18.1-arch1-1

Hardware: GPU

NVIDIA GeForce RTX 3060 Laptop GPU

Describe the bug

GPU cannot be driven, cannot recognize and use external monitor

To Reproduce

Device: Lenovo Legion 5 Pro(China version: Legion R9000P) 2021 with Ryzen 7 5800H Install nvidia-open(https://archlinux.org/packages/testing/x86_64/nvidia-open/) and reboot.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

~ » journalctl -r -p 3
Jun 05 11:26:55 Laptop-Legion bluetoothd[641]: src/profile.c:record_cb() Unable to get Hands-Free Voice gateway SDP record: Host is down
Jun 05 11:26:49 Laptop-Legion gdm-launch-environment][712]: GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
Jun 05 11:26:45 Laptop-Legion gdm-password][1103]: gkr-pam: unable to locate daemon control file
Jun 05 11:26:39 Laptop-Legion gnome-session-binary[732]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Jun 05 11:26:39 Laptop-Legion gnome-session-binary[732]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcfff1908 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcfff0000 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcfff0000 flags=0x0000]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcfff18d8 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcffe0000 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcffe0000 flags=0x0000]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcffe0830 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcffe0000 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcffe0000 flags=0x0000]
Jun 05 11:26:38 Laptop-Legion kernel: nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xcffe0808 flags=0x0020]
Jun 05 11:26:38 Laptop-Legion kernel: [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Jun 05 11:26:38 Laptop-Legion kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Jun 05 11:26:38 Laptop-Legion kernel: NVRM cpuidInfoAMD: Unrecognized AMD processor in cpuidInfoAMD
Jun 05 11:26:38 Laptop-Legion kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20211217/psobject-220)
Jun 05 11:26:38 Laptop-Legion kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.PB2], AE_NOT_FOUND (20211217/dswload2-162)

~ » nvidia-smi
No devices were found

~ » lsmod | grep nvidia                                  
nvidia_drm             73728  0
nvidia_uvm           2818048  0
nvidia_modeset       1335296  1 nvidia_drm
nvidia               5201920  14 nvidia_uvm,nvidia_modeset

~ » pacman -Qs nvidia
local/egl-wayland 2:1.1.9+r3+g582b2d3-1
    EGLStream-based Wayland external platform
local/libvdpau 1.5-1
    Nvidia VDPAU library
local/libxnvctrl 515.48.07-1
    NVIDIA NV-CONTROL X extension
local/nvidia-open 515.48.07-1
    NVIDIA open kernel modules
local/nvidia-prime 1.0-4
    NVIDIA Prime Render Offload configuration and utilities
local/nvidia-settings 515.48.07-1
    Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 515.48.07-1
    NVIDIA drivers utilities
local/nvidia-vaapi-driver-git 0.0.5.r25.g62a571c-1
    A VA-API implemention using NVIDIA's NVDEC
local/nvtop 2.0.1-1
    An htop like monitoring tool for NVIDIA GPUs
local/opencl-nvidia 515.48.07-1
    OpenCL implemention for NVIDIA
frosth555 commented 2 years ago

I have very similar issue with R7 5800HS + GF3060 (Asus GA503QM) and there is lot of GSP related warnings/erorrs: journalctl -p 4

kernel: NVRM nvAssertOkFailedNoLog: Assertion failed: Generic operating system error [NV_ERR_OPERATING_SYSTEM] (0x00000059) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_ga102.c:229
kernel: NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
kernel: NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0x59
kernel: NVRM RmInitAdapter: Cannot initialize GSP firmware RM
kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x63:0x59:1689)
kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
PAR2020 commented 2 years ago

@frosth555, could you provide a nvidia-bug-report for your system, please? Thanks.

frosth555 commented 2 years ago

i've completly forgot about it. nvidia-bug-report.log.gz

PAR2020 commented 2 years ago

Thanks!

On Jun 7, 2022, at 11:19 AM, frosth555 @.***> wrote:



i've completly forgot about it. nvidia-bug-report.log.gzhttps://github.com/NVIDIA/open-gpu-kernel-modules/files/8855583/nvidia-bug-report.log.gz

— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/open-gpu-kernel-modules/issues/282#issuecomment-1149015034, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALYJBVHZYMDUIONQYSBFIODVN6HB5ANCNFSM5X4NQLYA. You are receiving this because you commented.Message ID: @.***>

PAR2020 commented 2 years ago

Internal bug 3675186 filed.

Grimish-ng commented 2 years ago

I also have this similar issue on an Asus Zephyrus machine. Was uncertain what causes it as it changes whether I enable KMS on the amdgpu. I'll reinvestigate and post the findings.

PAR2020 commented 1 year ago

@Grimish-ng, anything new on your front? Internal team having difficulty reproducing on the other platforms. Any model specific details on your ASUS Zephyrus so we can try that config for a repro as well?

How do the symptoms change when you enable KMS on the amdgpu?

Thanks.

PAR2020 commented 1 year ago

Signature similar to #120

frosth555 commented 1 year ago

@PAR2020 I found #258 same behaviour

@Grimish-ng, how do you switch KMS for amdgpu? did you mean mux-switch?

Grimish-ng commented 1 year ago

Just dropped the drivers on really fast. Heres what I got after building modules against NVIDIA-Linux-x86_64-515.57.run

[grimish-ng@vsryzen .cache]$ lsmod | grep nv
nvidia_uvm           2785280  0
nvidia_drm             73728  0
nvidia_modeset       1351680  1 nvidia_drm
nvidia               5206016  5 nvidia_uvm,nvidia_modeset

[grimish-ng@vsryzen ~]$ nvidia-smi 
No devices were found

Jul 17 13:58:38 vsryzen kernel: NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0x88
Jul 17 13:58:38 vsryzen kernel: NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
Jul 17 13:58:38 vsryzen kernel: NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
Jul 17 13:58:38 vsryzen kernel: NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->>

nvidia-bug-report.log.gz

Sorry - I don't have alot of time to check my configuration and figured this would be best for ya for now. Please do let me know if I can provide more. Could be on my end at this point but i'll try to get around to verifying everything on my end.

Grimish-ng commented 1 year ago

Update on last post - Forgot that I switched kernels since the last driver update instead of using the generic 5.18 kernel. i'll switch back to a generic 5.18 kernel and see if I get the same behavior.

awsms commented 1 year ago

@frosth555, could you provide a nvidia-bug-report for your system, please? Thanks.

Do you need more logs? Becuase I'm having this issue on a Lenovo Legion 5 15ACH6H (5600H + RTX3060)

Fischer-Simon commented 1 year ago

I am having the same Problem. Kernel 5.19.1-arch2-1 and nvidia-open-515.65.01-4.

In addition to the logs already posted I have the following entries:

Aug 14 14:17:45 hostname kernel: NVRM cpuidInfoAMD: Unrecognized AMD processor in cpuidInfoAMD
Aug 14 14:17:46 hostname kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  515.65.01  Release Build  (archlinux-builder@hostname)  
Aug 14 14:17:46 hostname kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Aug 14 14:17:46 hostname systemd-udevd[879]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) 255'' failed with exit code 1.
2shrestha22 commented 1 year ago

I don't know how but I installed nvidia-open (515.65.01-5) and is working fine. ArchLinux

frosth555 commented 1 year ago

I don't know how but I installed nvidia-open (515.65.01-5) and is working fine. ArchLinux

did you have this issue before 515.65.01-5?, any logs? What HW do you use?

I've built this driver on fedora and ubuntu and still the same assertion errors so not arch problem, probably something platform specified.

onlymash commented 1 year ago

I don't know how but I installed nvidia-open (515.65.01-5) and is working fine. ArchLinux

did you have this issue before 515.65.01-5?, any logs? What HW do you use?

I've built this driver on fedora and ubuntu and still the same assertion errors so not arch problem, probably something platform specified.

Maybe he uses 6000 series APU, as far as I know nvidia-open work on Legion 5 Pro 2022 (6800H+3060)

onlymash commented 1 year ago

This is a photo from a legion 5 pro 2022 owner IMG_20220820_192754_065

onlymash commented 1 year ago

I can't stand it anymore, considering selling my laptop and replacing it with a 6800H+6700m/6850m xt laptop. Linux users stay away from Nvidia and become happy

2shrestha22 commented 1 year ago

I don't know how but I installed nvidia-open (515.65.01-5) and is working fine. ArchLinux

@frosth555 I have tried nvidia-open before but it didn't work. But today it is working fine. With nvidia driver I always had problem with brightness when switching between hybrid and discrete only mode from BIOS. But with nvidia-open I am not facing this issue anymore. (I removed nvidia-open, installed nvidia, then removed again with pacman -Rns, installed nvidia-open again. Now while using discrete GPU only brightness is always max). I was trying Runtime D3 Power Management but it does not work with Ryzen 4xxx and below. (edit: because nvidia driver does not allow: https://forums.developer.nvidia.com/t/runtime-d3-rtd3-with-quadro-t1200/196374/5)

Also the NVIDIA driver currently cannot be used as an output sink when the output source driver is xf86-video-amdgpu. (https://download.nvidia.com/XFree86/Linux-x86_64/455.45.01/README/randr14.html)

➜  ~ pacman -Qs | grep nvidia
local/nvidia-open 515.65.01-5
local/nvidia-settings 515.65.01-1
local/nvidia-utils 515.65.01-2
local/opencl-nvidia 515.65.01-2

My systemd boot entry:

title Arch
linux /vmlinuz-linux
initrd /amd-ucode.img
initrd /initramfs-linux.img
options root=PARTUUID=e938e475-feb0-4e09-b2ba-55db8aea3f62 zswap.enabled=0 rw rootfstype=ext4

Currently I don't have any xorg config file in /etc/X11 but I have created one with # nvidia-xconfig and it use discrete GPU as primary GPU.

Conclusion: nvidia-open does not work as expected for now. nvidia is stable for me and for power consumption I just use hybrid graphics in BIOS and use PRIME to use Nvidia GPU. Total power usage is about 15W on normal usage and GPU power usage remains 3W on ideal when not using PRIME. I would not bother with nvidia-open for now.

image

needlesslygrim commented 1 year ago

Is there any work going on from Nvidia to fix this? I cannot use the proprietary drivers or the open source drivers with Wayland on Gnome, as my external display does not work at all.

frosth555 commented 1 year ago

Is there any work going on from Nvidia to fix this? I cannot use the proprietary drivers or the open source drivers with Wayland on Gnome, as my external display does not work at all.

You probably missing "nvidia-drm.modeset=1" boot parameter. Look at this issue history, open NVIDIA doesn't work at all for us.

mtijanic commented 1 year ago

Hi there! The 515.xx version of open drivers does not support notebooks. It "works" to varying degrees for some people, but it is not something we ever tested and significant portions of the notebook-specific code were explicitly disabled to focus on the data center usecase.

Notebook support will come in one of the future major driver releases (but not 520.xx). We have bug 3675186 filed internally to make sure it also works on Ryzen APUs, but unless it turns out to be a bigger issue affecting desktops as well, we will not be integrating the fix into 515.xx.

We really appreciate the enthusiasm around testing the new open driver, but I have to ask you to be patient. The team is hard at work enabling additional features on these drivers, but it takes time to develop and QA these changes. Rushing untested code out the door helps nobody.

Thank you!

needlesslygrim commented 1 year ago

Is there any work going on from Nvidia to fix this? I cannot use the proprietary drivers or the open source drivers with Wayland on Gnome, as my external display does not work at all.

You probably missing "nvidia-drm.modeset=1" boot parameter. Look at this issue history, open NVIDIA doesn't work at all for us.

Nope I'm using it.

needlesslygrim commented 1 year ago

Hi there! The 515.xx version of open drivers does not support notebooks. It "works" to varying degrees for some people, but it is not something we ever tested and significant portions of the notebook-specific code were explicitly disabled to focus on the data center usecase.

Notebook support will come in one of the future major driver releases (but not 520.xx). We have bug 3675186 filed internally to make sure it also works on Ryzen APUs, but unless it turns out to be a bigger issue affecting desktops as well, we will not be integrating the fix into 515.xx.

We really appreciate the enthusiasm around testing the new open driver, but I have to ask you to be patient. The team is hard at work enabling additional features on these drivers, but it takes time to develop and QA these changes. Rushing untested code out the door helps nobody.

Thank you!

Thanks for the reply, this also doesn't work on the proprietary driver.

mtijanic commented 1 year ago

Thanks for the reply, this also doesn't work on the proprietary driver.

Hi @DivineBicycle, I'm looking at the bug report above, and it clearly states:

Does this happen with the proprietary driver (of the same version) as well? No

This matches our internal reproduction on bug 3675186.

Are you saying the exact same issue happens to you on the proprietary driver? I expect it's a different issue altogether, though the end result of "doesn't work" might be the same. I suggest reporting that issue to linux-bugs@nvidia.com for the proprietary driver.

needlesslygrim commented 1 year ago

Thanks for the reply, this also doesn't work on the proprietary driver.

Hi @DivineBicycle, I'm looking at the bug report above, and it clearly states:

Does this happen with the proprietary driver (of the same version) as well? No

This matches our internal reproduction on bug 3675186.

Are you saying the exact same issue happens to you on the proprietary driver? I expect it's a different issue altogether, though the end result of "doesn't work" might be the same. I suggest reporting that issue to linux-bugs@nvidia.com for the proprietary driver.

I am saying that yes. It may be that I have configured something incorrectly but what I do know is that when I use Wayland with GDM my secondary monitor is not recognised. It is connected directly to the Nvidia DGPU and the internal display is connected to the AMD IGPU.

To be fair, it doesn't work on X either but there is a third party program I can use to get it to work. However, that doesn't work on Wayland.

mtijanic commented 1 year ago

I'm not sure we're talking about the same thing here, sorry. Do you maybe mean issue #161 ? These are different things, as here the GPU itself doesn't boot, while on #161 it's a specific usecase (albeit a really big one) that does not work.

needlesslygrim commented 1 year ago

I'm not sure we're talking about the same thing here, sorry. Do you maybe mean issue #161 ? These are different things, as here the GPU itself doesn't boot, while on #161 it's a specific usecase (albeit a really big one) that does not work.

Maybe we aren't and maybe I stand corrected. I have now realised that he also wrote a comment on #161 so ignore this.

Grimish-ng commented 1 year ago

Confusing this issue thread has gotten. The proprietary driver absolutely works. I can't dig on the gripes ending up here. Don't whine - it doesn't help. Provide data. Now as for my experiences, I have to give it to nvidia for their proprietary stuff - It has worked for many years with only minor issues. It works today. It even works with holoiso for those who love your steamdeck interface, works with laptops native monitor, and external monitors, and both at the same time. I will eat green eggs & ham in the dark, and in the park. In all honesty aside from being proprietary, the drivers provided by nvidia have worked very well for the last... I don't know, 20 years now? Say what you will about desicions over the years for keeping them proprietary - but they still provided them. They provided them early on before many companies were still not providng drivers and it was a big deal. A huge deal. I stand by Nvidia for that and their drivers because they provided good drivers regardless of features or lack their of for the open source community & their systems for all these years. I appreciate them even more for now providing open source and remaining competitive even if they are in alpha. We test them because we want to help them get there faster and done to their absolute best.

Now that that is over, As for this round of Open GPU Drivers I built, i'm still having pretty much the same issues. But hey, i've waited a long time for this and I can wait a little longer :1st_place_medal:

onlymash commented 1 year ago

Confusing this issue thread has gotten. The proprietary driver absolutely works. I can't dig on the gripes ending up here. Don't whine - it doesn't help. Provide data. Now as for my experiences, I have to give it to nvidia for their proprietary stuff - It has worked for many years with only minor issues. It works today. It even works with holoiso for those who love your steamdeck interface, works with laptops native monitor, and external monitors, and both at the same time. I will eat green eggs & ham in the dark, and in the park. In all honesty aside from being proprietary, the drivers provided by nvidia have worked very well for the last... I don't know, 20 years now? Say what you will about desicions over the years for keeping them proprietary - but they still provided them. They provided them early on before many companies were still not providng drivers and it was a big deal. A huge deal. I stand by Nvidia for that and their drivers because they provided good drivers regardless of features or lack their of for the open source community & their systems for all these years. I appreciate them even more for now providing open source and remaining competitive even if they are in alpha. We test them because we want to help them get there faster and done to their absolute best.

Now that that is over, As for this round of Open GPU Drivers I built, i'm still having pretty much the same issues. But hey, i've waited a long time for this and I can wait a little longer 🥇

Proprietary drivers are still a piece of shit, even the brightness of laptop cannot be adjusted, and the Dynamic Boost is not supported

Chicchi7393 commented 1 year ago

somehow for me (R5 5600H, Linux Zen, RTX 3060 Mobile, Arch) nvidia-open-dkms works.

ViBE-HU commented 1 year ago

im also facing similar issue. does this relates?

i'm stuck. i also took a try on nvidia and lenovo forums. no improvement.

frosth555 commented 1 year ago

@ViBE-HU if your gpu works(in any way) with open driver it's not similar issue. you better off open your own issue with relevant logs.

ViBE-HU commented 1 year ago

that's why i'm asking. seeking for exact reports cause i don't want to open duplicates.

FlorianFranzen commented 1 year ago

Tried 525.53 and it still fails to bring up the GPU.

Marc-Pierre-Barbier commented 1 year ago

stilll fails but with lots of

[  456.681759] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[  456.681764] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:235
[  456.681767] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[  456.681770] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[  456.682858] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0xffff:1622)
[  456.683163] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  456.832405] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0x88
[  456.832408] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[  456.844371] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
Grimish-ng commented 1 year ago

I'm still in the same boat as well on a ASUS Zephyrus Rog G15 2021. Driver 525.60.11

diramazioni commented 1 year ago

switching to proprietary made the GPU work again

FlorianFranzen commented 1 year ago

@mtijanic @PAR2020 Any update on bug 3675186?

I am on kernel 6.1.11 with 525.89.02 and while the closed source driver works well, the open source version just crashes with:

[   15.316951] nvidia: loading out-of-tree module taints kernel.
[   15.357747] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[   15.357752] NVRM cpuidInfoAMD: Unrecognized AMD processor in cpuidInfoAMD
[   15.359030] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   15.407882] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  525.89.02  Release Build  (nixbld@)  Mon Feb 13 04:28:14 UTC 2023
[   15.633241] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  525.89.02  Release Build  (nixbld@)  Mon Feb 13 04:28:02 UTC 2023
[   15.733856] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[   17.269041] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic operating system error [NV_ERR_OPERATING_SYSTEM] (0x00000059) returned from ((rpc_message_header_v *)pKernelGsp->pRpc->message_buffer)->rpc_result @ kernel_gsp.c:2934
[   17.269053] NVRM nvAssertOkFailedNoLog: Assertion failed: Generic operating system error [NV_ERR_OPERATING_SYSTEM] (0x00000059) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_ga102.c:237
[   17.269056] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:243
[   17.269059] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0x59
[   17.269062] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[   17.270375] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x59:1615)
[   17.270850] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   17.270987] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[   17.271048] nvidia-uvm: Loaded the UVM driver, major device number 236.
[   17.272034] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[   17.276711] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcffd08c8 flags=0x0020]
[   17.277800] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcffd0000 flags=0x0000]
[   17.278754] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcffd0000 flags=0x0020]
[   17.279755] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcffd08f0 flags=0x0020]
[   17.280692] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcffd0000 flags=0x0000]
[   17.281668] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcffd0000 flags=0x0020]
[   17.282583] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcfff1908 flags=0x0020]
[   17.283509] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcfff0000 flags=0x0000]
[   17.284387] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcfff0000 flags=0x0020]
[   17.285298] nvidia 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000c address=0xcfff1938 flags=0x0020]
[   17.604348] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0x88
[   17.604351] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[   17.614929] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[   17.614933] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[   17.614938] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:243
[   17.614941] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[   17.614943] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[   17.616320] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0xffff:1615)
[   17.616907] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[   17.779109] NVRM s_executeBooterUcode_TU102: Booter failed with non-zero error code: 0x88
[   17.779112] NVRM kgspExecuteBooterUnloadIfNeeded_TU102: failed to execute Booter Unload: 0xffff
[   17.788747] NVRM s_executeFwsec_TU102: failed to execute FWSEC for FRTS: FRTS error code 0xbe
[   17.788750] NVRM nvAssertOkFailedNoLog: Assertion failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from kgspExecuteFwsecFrts_HAL(pGpu, pKernelGsp, pKernelGsp->pFwsecUcode, pKernelGsp->pWprMeta->frtsOffset) @ kernel_gsp_ga102.c:164
[   17.788754] NVRM nvAssertFailedNoLog: Assertion failed: status == NV_OK @ kernel_gsp_ga102.c:243
[   17.788757] NVRM kgspInitRm_IMPL: cannot bootstrap riscv/gsp: 0xffff
[   17.788759] NVRM RmInitAdapter: Cannot initialize GSP firmware RM
[   17.789857] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0xffff:1615)
[   17.790351] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

This is on a Lenovo Legion 7 with AMD Ryzen 7 5800H, model 16ACHg6, type 82N6 with the latest UEFI (GKCN59WW).

Edit and note: This seems to be a duplicate of #120.

frosth555 commented 1 year ago

some update here: I am able to load and using open nvidia modules with nvidia-530.41.03/kernel-6.3.x

dmesg | grep -e nvidia -e NVRM
[    4.348743] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[    4.348747] NVRM cpuidInfoAMD: Unrecognized AMD processor in cpuidInfoAMD
[    4.349064] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[    4.349150] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    4.398571] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  530.41.03  Release Build  (archlinux-builder@)  
[    4.506711] nvidia-uvm: Loaded the UVM driver, major device number 507.
[    4.782414] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  530.41.03  Release Build  (archlinux-builder@)  
[    4.799155] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    7.289904] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1

no asserion/gsp errors on gnome/wayland/proton performance is pretty the same as proprietary driver, but power managements doesn't work and nvidia is active all the time (that's another issue)

/proc/driver/nvidia/gpus/0000:01:00.0/power
Runtime D3 status:          ?
Video Memory:               ?

GPU Hardware Support:
 Video Memory Self Refresh: ?
 Video Memory Off:          ?

greetings

FlorianFranzen commented 1 year ago

I can replicate @frosth555 results using 530.41.03 on linux 6.3.1 and 6.2.10, with the same power management issues. It is progress though! :tada:

jthoward64 commented 6 months ago

Anyone checkout out the 545 series driver?

frosth555 commented 6 months ago

@jthoward64 what kind of data you are looking for? I've just run it, pm issues (I mentioned earlier) are gone, perf still decent. There is some dmesg spam.. I should post new bug report.

[  640.827946] NVRM nvCheckOkFailedNoLog: Check failed: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) returned from gpuGetByHandle(pClient, pArgs->hObject, NULL, &pGpu) @ rmapi_gss_legacy_control.c:87
[  687.580334] NVRM nbsiReadRegistryDword: osReadRegistryDword called in Sleep path can cause excessive delays!
[  687.580337] NVRM nvAssertFailedNoLog: Assertion failed: 0 @ nbsi_osrg.c:107

I can't say anything about stability (it's too early)

I've used simple arch's nvidia-open-dkms package 545.29.06

amrit1711 commented 5 months ago

Request to please verify fix with driver 545.29.06 and share test results.

Fischer-Simon commented 5 months ago

I am using extra/nvidia-open 545.29.06-14 on Arch (6.7.2-arch1-1). Performance is good (didn't make a direct comparison to the closed source drivers though). I don't have any stability issues, even hibernate which occasionally froze xorg on the closed source driver during resume is perfectly stable. Another thing is a now very low power consumption when using reverse prime. There are still some dmesg entries when resuming from hibernate:

Jan 29 08:38:30 hostname kernel: NVRM nbsiReadRegistryDword: osReadRegistryDword called in Sleep path can cause excessive delays!
Jan 29 08:38:30 hostname kernel: NVRM nvAssertFailedNoLog: Assertion failed: 0 @ nbsi_osrg.c:107
Jan 29 08:38:30 hostname kernel: NVRM nbsiReadRegistryDword: osReadRegistryDword called in Sleep path can cause excessive delays!
Jan 29 08:38:30 hostname kernel: NVRM nvAssertFailedNoLog: Assertion failed: 0 @ nbsi_osrg.c:107
Jan 29 08:38:30 hostname kernel: NVRM nbsiReadRegistryDword: osReadRegistryDword called in Sleep path can cause excessive delays!
Jan 29 08:38:30 hostname kernel: NVRM nvAssertFailedNoLog: Assertion failed: 0 @ nbsi_osrg.c:107
Jan 29 08:38:30 hostname kernel: NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_UNIX_CONSOLE, &unixConsoleParams, sizeof(unixConsoleParams)) @ unix_console.c:105
Jan 29 08:38:31 hostname kernel: NVRM unixCallVideoBIOS: int10h(4f02, 0000) vesa call failed! (4f02, 0000)
Jan 29 08:38:31 hostname kernel: NVRM nvCheckOkFailedNoLog: Check failed: Failure: Generic Error [NV_ERR_GENERIC] (0x0000FFFF) returned from pRmApi->Control(pRmApi, nv->rmapi.hClient, nv->rmapi.hSubDevice, NV2080_CTRL_CMD_INTERNAL_DISPLAY_POST_RESTORE, &restoreParams, sizeof(restoreParams)) @ unix_console.c:197
Jan 29 08:38:46 hostname kernel: NVRM serverFreeResourceTree: hObject 0xbeef0400 not found for client 0xc1d070be

Also during startup:

Jan 29 14:00:32 hostname kernel: NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64  545.29.06  Release Build  (archlinux-builder@)  
Jan 29 14:00:44 hostname kernel: NVRM testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11.
brauliobo commented 4 months ago

All fine here nvidia-open-dkms 550.54.14-4 on Archlinux 6.7.8 with Ryzen 5600g connected to HDMI TV HDR display and Nvidia 3060 for both mining with nbminer and Vulkan processing of games in Steam Proton.

ShalokShalom commented 6 days ago

Can someone ping some maintainer of Nvidia, to raise awareness about this. Considering it is already a couple of years old by now, it seems to deserve attention.

@niv Can you help please?

frosth555 commented 6 days ago

@ShalokShalom, awarness abaut what? the issue is fixed long time ago. if you met similar issue please open new raport with logs from your machine etc.