elFarto / nvidia-vaapi-driver

A VA-API implemention using NVIDIA's NVDEC
Other
1.17k stars 53 forks source link

nv_alloc_object failed #211

Closed StableNarwhal closed 1 year ago

StableNarwhal commented 1 year ago

I'm trying to get HW acceleration working on Pop!_OS 22.04 LTS (based on Ubuntu) with a 3080 GPU and nvidia-driver-525 on version 525.105.17.

I've cloned this repo and the open-gpu-kernel-modules repo, ran extract_headers.sh <path/to/open-gpu-kernel-modules> and ran meson setup build meson install -C build before rebooting.

My env variables in /etc/environment are:

MOZ_DISABLE_RDD_SANDBOX=1
EGL_PLATFORM=wayland
LIBVA_DRIVER_NAME=nvidia
BROWSER=firefox
MOZ_ENABLE_WAYLAND=1
NVD_BACKEND=direct

But in NVD_LOG=1 LIBVA_DRIVER_NAME=nvidia vainfo I get this output:

libva info: VA-API version 1.14.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
       589.800697345 [9351-9351] ../src/vabackend.c:2171       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 10
       589.800701665 [9351-9351] ../src/vabackend.c:2180       __vaDriverInit_1_0 Now have 0 (0 max) instances
       589.800704295 [9351-9351] ../src/vabackend.c:2206       __vaDriverInit_1_0 Selecting Direct backend
       589.803880028 [9351-9351] ../src/direct/direct-export-buf.c:  85      direct_initExporter Found NVIDIA GPU 0 at /dev/dri/renderD128
       589.803885178 [9351-9351] ../src/direct/nv-driver.c: 217            init_nvdriver Initing nvdriver...
       589.803887828 [9351-9351] ../src/direct/nv-driver.c: 222            init_nvdriver Got dev info: 2b00 1 2 6
       589.803902788 [9351-9351] ../src/direct/nv-driver.c:  33          nv_alloc_object nv_alloc_object failed: -1 0 22
       589.803904788 [9351-9351] ../src/direct/nv-driver.c: 243            init_nvdriver nv_alloc_object NV01_ROOT_CLIENT failed
       589.803906028 [9351-9351] ../src/direct/nv-driver.c: 305            init_nvdriver Got error initing
       589.803918097 [9351-9351] ../src/direct/nv-driver.c:  76            nv_rm_control nv_rm_control failed: -1 0 25
libva error: /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit

I couldn't find any clues why nv_alloc_object failed, any help would be greatly appreciated.

philipl commented 1 year ago

I'm seeing this too with the new 535 drivers. Maybe they way nv_alloc_object works changed. They did say there was not API stability...

elFarto commented 1 year ago

Hopefully this is just a minor change in their API, as that is the first object we allocate I think. I'll take a look at this when I get some time.

elFarto commented 1 year ago

Good news, the fix is pretty easy, but it's going to take a bit more time to make it work across all versions of their API.

elFarto commented 1 year ago

Ok, that didn't take as long as I expected. There's a fix for the issue in master. I've haven't directly tested it with the older versions of the driver yet, but it should work.

StableNarwhal commented 1 year ago

Confirmed working on 525.105.17!

libva info: VA-API version 1.14.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.14 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain12             : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointVLD

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|

|    0   N/A  N/A     xxxx    C+G   /usr/lib/firefox/firefox-bin      420MiB |
+-----------------------------------------------------------------------------+

Thank you for the fast fix!

philipl commented 1 year ago

Works with 535.43.02 now. Thanks!

555isfaiz commented 1 year ago

Hi, it seems like this problem still exists for 530.41.03 drivers. I'm using latest ArchOS and a 3080 card. The output from NVD_LOG=1 vainfo is:

Trying display: wayland
      3894.233117195 [34438-34438] ../src/vabackend.c:2171       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 40
      3894.233127765 [34438-34438] ../src/vabackend.c:2180       __vaDriverInit_1_0 Now have 0 (0 max) instances
      3894.233129495 [34438-34438] ../src/vabackend.c:2206       __vaDriverInit_1_0 Selecting Direct backend
      3894.236897372 [34438-34438] ../src/direct/nv-driver.c: 222            init_nvdriver Initing nvdriver...
      3894.236906842 [34438-34438] ../src/direct/nv-driver.c: 227            init_nvdriver Got dev info: 2b00 1 2 6
      3894.236939522 [34438-34438] ../src/direct/nv-driver.c: 245            init_nvdriver NVIDIA kernel driver version: 530.41.03, major version: 530
      3894.236943292 [34438-34438] ../src/direct/nv-driver.c:  59          nv_alloc_object nv_alloc_object failed: -1 0 22
      3894.236945202 [34438-34438] ../src/direct/nv-driver.c: 251            init_nvdriver nv_alloc_object NV01_ROOT_CLIENT failed
      3894.236946972 [34438-34438] ../src/direct/nv-driver.c: 307            init_nvdriver Got error initing
      3894.236967262 [34438-34438] ../src/direct/nv-driver.c: 102            nv_rm_control nv_rm_control failed: -1 0 25
      3894.236971432 [34438-34438] ../src/vabackend.c:2231       __vaDriverInit_1_0 Exporter failed
libva error: /usr/lib/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit

Btw, if I install libva-nvidia-driver instead of libva-nvidia-driver-git, vainfo will work properly but there's no C+G type of firefox process in nvidia-smi pmon. Only G type. I am using the version 115 of firefox developer edition.

elFarto commented 1 year ago

@555isfaiz I've found the issue you're having and push a fix for it. I had assumed because the structure the v530 drivers use is only missing one field vs v535, that it would be 4 bytes smaller. It wasn't, it was 8 bytes smaller due some alignment issues.

555isfaiz commented 1 year ago

@555isfaiz I've found the issue you're having and push a fix for it. I had assumed because the structure the v530 drivers use is only missing one field vs v535, that it would be 4 bytes smaller. It wasn't, it was 8 bytes smaller due some alignment issues.

thx a lot! I just tested, the vainfo works fine now. But when I play a video in firefox, there's still no C+G process. I'm pretty sure everything works fine up until some moment I updated something. Unfortunately I dont remember which one it is that breaks the hardware acceleration..

elFarto commented 1 year ago

I've release v0.0.10 that contains the fix.