Closed diniamo closed 3 weeks ago
@diniamo I think I managed to pinpoint the issue. Seems like the open kernel module doesn't support CUDA, or fails to set up the /dev devices correctly. Force the open kernel module off with hardware.nvidia.open = false;
in your nixos configuration and it should work. Issue #194 seems relevant to our issue.
@laengepl it is set to false and it always has been.
These early 999 'unknown errors' are a bit of a pain to diagnose, but they indicate something severely wrong with your setup. Something like driver not installed correctly or permission problems with the /dev/nvidia* files.
Are there things I can verify?
Can you check #253?
Um, check what exactly?
Whether it's related to suspend, and if the NVreg_PreserveVideoMemoryAllocations=1
trick couldn't address it
It's not related to suspension, however that flag seems to have solved it?? Thanks.
Not sure if the issue should be closed, up to you.
Never mind, looks like I'd already had that flag. I have no idea what fixed it then.
I actually have half a clue, and if I didn't casually botched the argument (turns out it has to be nvidia.NVreg_PreserveVideoMemoryAllocations=1
in the command line) I would be none the wiser.
Sometimes(?) suspension does not just screw with whatever applications you have opened. You are like screwed globally with everything until you restart the whole system (or at least not even logging out and in again could fix it for me).
Hopefully I'm not having problems again now.
My issue wasn't related to suspension.
Do note that this issue still happens with the open drivers.
The open drivers use mesa, which doesn't need any of this. Or even if it has bugs, you should report to them.
Oh? Didn't know that. Either way, OpenCL fails with the open drivers as well, so I can't switch to them.
The open drivers use mesa, which doesn't need any of this.
That's not correct as I understand it. The open source nouveau driver uses MESA, but NVIDIA's open source driver uses the same user space components as the closed source kernel driver.
NVD_LOG=1 vainfo
``` Trying display: wayland libva info: VA-API version 1.21.0 libva info: User environment variable requested driver 'nvidia' libva info: Trying to open /run/opengl-driver/lib/dri/nvidia_drv_video.so 4004.036737070 [108942-108942] ../src/vabackend.c: 168 init CUDA ERROR 'unknown error' (999) libva info: Found init function __vaDriverInit_1_0 4004.036760315 [108942-108942] ../src/vabackend.c:2188 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 40 4004.036762314 [108942-108942] ../src/vabackend.c:2197 __vaDriverInit_1_0 Now have 0 (0 max) instances 4004.036764039 [108942-108942] ../src/vabackend.c:2223 __vaDriverInit_1_0 Selecting Direct backend 4004.043594504 [108942-108942] ../src/direct/nv-driver.c: 267 init_nvdriver Initing nvdriver... 4004.043614921 [108942-108942] ../src/direct/nv-driver.c: 285 init_nvdriver NVIDIA kernel driver version: 555.42.02, major version: 555, minor version: 42 4004.043618421 [108942-108942] ../src/direct/nv-driver.c: 292 init_nvdriver Got dev info: 100 1 2 6 4004.046716108 [108942-108942] ../src/direct/direct-export-buf.c: 27 findGPUIndexFromFd CUDA ERROR 'initialization error' (3) 4004.046723471 [108942-108942] ../src/vabackend.c:2253 __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3) libva error: /run/opengl-driver/lib/dri/nvidia_drv_video.so init failed libva info: va_openDriver() returns 1 vaInitialize failed with error code 1 (operation failed),exit ```I use NixOS with the new 555 beta driver. NVD_BACKEND is set to direct and LIBVA_DRIVER_NAME to nvidia, not sure what other information to provide.