NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.22k stars 1.28k forks source link

Hard freezes with opensource driver #82

Open sandikata opened 2 years ago

sandikata commented 2 years ago

NVIDIA Driver Version 515.43.04.

GPU RTX 3050

Describe the bug Not work at all

To Reproduce Build the driver regarding the documentation, and using it for X.Org

Expected behavior To work by design

Please reproduce the problem, run nvidia-bug-report.sh, and attach the resulting nvidia-bug-report.log.gz. nvidia-bug-report.log.gz

Xorg.0.log

alcaparra commented 2 years ago

nvidia moment

TheRealOne78 commented 2 years ago

nvidia moment

LOL ya

sandikata commented 2 years ago

I've managed to make it "work" but it freezes regularly, and only hard reset helps.

sandikata commented 2 years ago

@mtijanic Do you have any idea about this, or do someone suffer it?

mtijanic commented 2 years ago

Hi, thanks for the report!

From the attached logs, I see:

NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
NVRM: To force use of Open nvidia.ko on other GPUs, see the
NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter described
NVRM: in the README.

Are you sure you are running the open source flavor of the driver? You can verify with modinfo nvidia | grep license

The logs also previously show failure to initialize nvidia.ko because another driver (probably nouveau) is already active.

Let's first make sure we're running the open source driver, and if so we can try to up the logging level and see what went wrong.

sandikata commented 2 years ago

There were actually conflict between nouveau and nvidia opensource flavor. I will let you know when i boot again with the opensource nvidia flavor to collect some data.

Regards

sandikata commented 2 years ago

Hello again, well there's the correct bugreport.

nvidia-bug-report.log.gz

Don't have freeze at the moment (and i cannot provide any report while it freeze)

aritger commented 2 years ago

@sandikata, here are few possible experiments:

(1) Do you see similar freezes with the 515.43.04 binary kernel modules, as well as the open kernel modules?

(2) Did you see similar freezes with previous NVIDIA driver releases?

(3) When the system freezes, is the system still accessible over the network? Or, is the only option at that point to reboot?

sandikata commented 2 years ago

Hello,

  1. Yes only on OpenSource 515.43.04, proprietary from same version works.
  2. Here can say no
  3. Mostly on heavy load (gaming, encoding and etc), not accessible in any way the only is reboot.
dylif commented 2 years ago

I am having the same issue here: Kernel: 5.17.7-arch1-1 Driver: nvidia-open 515.43.04

Seems like something in the kernel module is locking up whenever I run an OpenGL game (i.e. Minecraft), as there is a lot of reports of timeouts. Strangely running a Vulkan game (GTA 5 through Proton and DXVK) works perfectly.

This does not happen with the proprietary kernel modules; both OpenGL and Vulkan games work perfectly.

Please note I have tried adding the nvidia-drm.modeset=1 kernel parameter which changed nothing.

Attached is a snippet of a log that I think is of interest. nvidia.log

Hopefully this helps

dylif commented 2 years ago

I should add that if I am quick enough after the freeze occurs, I can switch to a tty and have it be fully functional.

dylif commented 2 years ago

I'm not sure if that was a reply to my comment, but I attempted to blacklist the nouveau driver, which didn't solve my issues. Upon further inspection, on my system nouveau was already blacklisted by the nvidia-utils package in /usr/lib/modprobe.d.

shamefulCake1 commented 1 month ago

Should be closed as outdated?