Open bmartin427 opened 9 months ago
For reference here's a session using the direct backend. The first query was before a suspend/resume, the latter two were after.
brad@fx2:~$ NVD_LOG=1 NVD_BACKEND=direct vainfo
libva info: VA-API version 1.14.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
4089.149695354 [3287-3287] ../src/vabackend.c:2171 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
4089.149724484 [3287-3287] ../src/vabackend.c:2180 __vaDriverInit_1_0 Now have 0 (0 max) instances
4089.149746525 [3287-3287] ../src/vabackend.c:2206 __vaDriverInit_1_0 Selecting Direct backend
4089.163510502 [3287-3287] ../src/direct/direct-export-buf.c: 85 direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
4089.163532980 [3287-3287] ../src/direct/nv-driver.c: 223 init_nvdriver Initing nvdriver...
4089.163541389 [3287-3287] ../src/direct/nv-driver.c: 228 init_nvdriver Got dev info: 100 1 0 fe
4089.163612291 [3287-3287] ../src/direct/nv-driver.c: 246 init_nvdriver NVIDIA kernel driver version: 535.113.01, major version: 535
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.14 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileVC1Simple : VAEntrypointVLD
VAProfileVC1Main : VAEntrypointVLD
VAProfileVC1Advanced : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain12 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointVLD
4089.308220963 [3287-3287] ../src/vabackend.c:2081 nvTerminate Terminating 0x55933e7e4d40
4089.308325527 [3287-3287] ../src/vabackend.c:2095 nvTerminate Now have 0 (0 max) instances
brad@fx2:~$ NVD_LOG=1 NVD_BACKEND=direct vainfo
libva info: VA-API version 1.14.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
4221.457787068 [3540-3540] ../src/vabackend.c:2171 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
4221.457808648 [3540-3540] ../src/vabackend.c:2180 __vaDriverInit_1_0 Now have 0 (0 max) instances
4221.457820940 [3540-3540] ../src/vabackend.c:2206 __vaDriverInit_1_0 Selecting Direct backend
4221.472699819 [3540-3540] ../src/direct/direct-export-buf.c: 85 direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
4221.472724892 [3540-3540] ../src/direct/nv-driver.c: 223 init_nvdriver Initing nvdriver...
4221.472737114 [3540-3540] ../src/direct/nv-driver.c: 228 init_nvdriver Got dev info: 100 1 0 fe
4221.472851581 [3540-3540] ../src/direct/nv-driver.c: 246 init_nvdriver NVIDIA kernel driver version: 535.113.01, major version: 535
4221.474599881 [3540-3540] ../src/vabackend.c:2236 __vaDriverInit_1_0 CUDA ERROR 'unknown error' (999)
libva error: /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit
brad@fx2:~$ NVD_LOG=1 NVD_BACKEND=direct vainfo
libva info: VA-API version 1.14.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
4226.566012274 [3543-3543] ../src/vabackend.c: 138 init CUDA ERROR 'unknown error' (999)
libva info: Found init function __vaDriverInit_1_0
4226.566085396 [3543-3543] ../src/vabackend.c:2171 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
4226.566098805 [3543-3543] ../src/vabackend.c:2180 __vaDriverInit_1_0 Now have 0 (0 max) instances
4226.566110469 [3543-3543] ../src/vabackend.c:2206 __vaDriverInit_1_0 Selecting Direct backend
4226.578729192 [3543-3543] ../src/direct/direct-export-buf.c: 85 direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
4226.578750354 [3543-3543] ../src/direct/nv-driver.c: 223 init_nvdriver Initing nvdriver...
4226.578759782 [3543-3543] ../src/direct/nv-driver.c: 228 init_nvdriver Got dev info: 100 1 0 fe
4226.578826339 [3543-3543] ../src/direct/nv-driver.c: 246 init_nvdriver NVIDIA kernel driver version: 535.113.01, major version: 535
4226.578960222 [3543-3543] ../src/direct/direct-export-buf.c: 23 findGPUIndexFromFd CUDA ERROR 'initialization error' (3)
4226.578971746 [3543-3543] ../src/vabackend.c:2236 __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3)
libva error: /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit
I also have the same two dmesg lines as before.
I'm seeing something related to this, but in my case Firefox crashes upon resuming. I've just disabled nvidia-vaapi-driver completely and will see if the crashes continue. I've tried setting up NVIDIA's PreserveVideoMemoryAllocations, also but it made gnome-shell become impossible to use after resume (which is even worse...)
Unfortunately this is an issue with the NVIDIA driver, and there's not much I can do about it. The driver really doesn't like having any sort of NVDEC context that's left active over the suspend/resume causes it to break the driver until a reboot is done.
Hmm. If firefox is closed before I suspend, then is there anything else I can do to prevent NVDEC context from being left active? Is there something else I need to explicitly kill, or is it really just that I've ever used it at all?
Know issue of the nvidia driver. After suspend/resume, the nvidia-uvm module is defunct even if not used. The workaround being unloading/reloading it.
Can confirm this. I wrote up a specific "how to" for Pop!_OS users just yesterday, but after resume from suspend HW acceleration in Firefox is broken. Only a reboot fixes it. I haven't tried unloading/reloading but that's not really a solution for the average user.
Question, it's a "known issue" with the NVIDIA driver, but is there any actual confirmation or bug tracking within NVIDIA as a company? Does this bug affect Wayland or only X11 windowing systems? I ask that because, and I'm only moderately knowledgeable about Linux with nearly ZERO experience with Wayland, so I don't know if Wayland even requires a vaapi layer for hardware acceleration of video codecs.
I'm not sure if there's an actual NVIDIA bug for it. I've bumped the issue[1] in the NVIDIA forums and we'll see if we get a response.
[1] https://forums.developer.nvidia.com/t/xid-31-after-wakeup-from-sleep/139870/6
Having the same issue under laptop in secondary nvidia card in PRIME configuration. Hardware acceleration fails after resume from suspend.
$ NVD_LOG=1 NVD_BACKEND=direct vainfo
libva info: VA-API version 1.20.0
libva error: vaGetDriverNames() failed with unknown libva error
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
135775.283643377 [30120-30120] ../src/vabackend.c: 130 init CUDA ERROR 'unknown error' (999)
libva info: Found init function __vaDriverInit_1_0
135775.283662988 [30120-30120] ../src/vabackend.c:2145 __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
135775.283665133 [30120-30120] ../src/vabackend.c:2154 __vaDriverInit_1_0 Now have 0 (0 max) instances
135775.283667649 [30120-30120] ../src/vabackend.c:2180 __vaDriverInit_1_0 Selecting Direct backend
135775.286633777 [30120-30120] ../src/backend-common.c: 31 isNvidiaDrmFd Invalid driver for DRM device: i915
135775.286665005 [30120-30120] ../src/direct/direct-export-buf.c: 85 direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD129
135775.286668121 [30120-30120] ../src/direct/nv-driver.c: 246 init_nvdriver Initing nvdriver...
135775.286683125 [30120-30120] ../src/direct/nv-driver.c: 264 init_nvdriver NVIDIA kernel driver version: , major version: 0, minor version: 0
135775.286685882 [30120-30120] ../src/direct/nv-driver.c: 271 init_nvdriver Got dev info: 100 1 2 6
135775.286771896 [30120-30120] ../src/direct/direct-export-buf.c: 23 findGPUIndexFromFd CUDA ERROR 'initialization error' (3)
135775.286774654 [30120-30120] ../src/vabackend.c:2210 __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3)
libva error: /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit
Doing nvidia-uvm reloading solves the issue:
# rmmod nvidia-uvm
# modprobe nvidia-uvm
Aren't standby problems related to the stuff discussed in #182? And isn't it all fixed in 545+?
Last time I tried some 535 driver, it refused to decrease cooler speed after some video playback. My laptop sounded like a jet-plane & never stopped unless rebooted.
I'll try 545 this time. Thanks for suggestion.
I checked 545.23.08 version and looks like they've fixed both cooler speed & hw acceleration after suspend/resume issues.
I think the issue might be closed now.
I checked 545.23.08 version and looks like they've fixed both cooler speed & hw acceleration after suspend/resume issues.
I think the issue might be closed now.
Looks like I was too quick. The suspend/resume hw acceleration bug is still there in driver 545.23.08. vainfo emits error & Firefox acceleration is missing after 3-4th resume from suspend.
This bug is still there in driver 550.78
I am using Archlinux, the instructions here solved my problem, I hope it will be useful to you.
I have acceleration working fine on my media PC, as long as I try it soon after boot. However I suspend this PC in between uses, and acceleration never works following such a cycle until I reboot. Every other GPU function I've tested continues working after the failure: OpenGL, VDPAU, etc are all fine. Hardware is a GeForce GT 1030, OS is Ubuntu 22.04, nvidia driver version is 535.113.01, and nvidia-vaapi-driver version is git 0a924c.
The first time I try running
vainfo
after a resume, I get:Also, the following lines appear in dmesg during that first
vainfo
query:Subsequent calls to
vainfo
produce no more dmesg output, and the console output changes somewhat:I have tried direct backend instead of egl, and get no different results, aside from some slightly different error text.
I'm not 100% certain the suspend and resume is the cause. I have attempted a quick suspend/resume cycle in order to troubleshoot this problem and been unable to reproduce; but it always happens if I leave it suspended for a normal amount of time (hours). So possibly something else about the elapsed time is involved.
I also have tried to leave firefox running during a suspend/resume, thinking that acceleration might continue to function if I just didn't have to repeat the initialization process, however firefox seems to explode immediately upon resume, so this is not an option.