elFarto / nvidia-vaapi-driver

A VA-API implemention using NVIDIA's NVDEC
Other
1.15k stars 53 forks source link

Nvidia-vaapi-driver is not initialized (CUDA ERROR 802) in very old PC on LinuxMint21.2 for GTX1050TI card #260

Closed Wyacheslaw8 closed 5 months ago

Wyacheslaw8 commented 7 months ago

Hello I compiled and installed nvidia-vaapi-driver (v0.0.11) successfully. Nvidia-vaapi-driver is not initialized (CUDA ERROR 802) in very old PC on LinuxMint21.2 for GTX1050TI card .

My results:

NVD_LOG=1 vainfo --display drm --device /dev/dri/renderD128

      1236.335911535 [4264-4264] ../src/vabackend.c: 130                     init CUDA ERROR 'system not yet initialized' (802)

      1236.336062669 [4264-4264] ../src/vabackend.c:2145       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
      1236.336076079 [4264-4264] ../src/vabackend.c:2154       __vaDriverInit_1_0 Now have 0 (0 max) instances
      1236.336086974 [4264-4264] ../src/vabackend.c:2180       __vaDriverInit_1_0 Selecting Direct backend
      1236.357610389 [4264-4264] ../src/direct/nv-driver.c: 246            init_nvdriver Initing nvdriver...
      1236.357717943 [4264-4264] ../src/direct/nv-driver.c: 264            init_nvdriver NVIDIA kernel driver version: 535.129.03, major version: 535, minor version: 129
      1236.357730514 [4264-4264] ../src/direct/nv-driver.c: 271            init_nvdriver Got dev info: 100 1 0 fe
      1236.357886398 [4264-4264] ../src/direct/direct-export-buf.c:  23       findGPUIndexFromFd CUDA ERROR 'initialization error' (3)

      1236.357899528 [4264-4264] ../src/vabackend.c:2210       __vaDriverInit_1_0 CUDA ERROR 'initialization error' (3)

libva error: /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so init failed
vaInitialize failed with error code 1 (operation failed),exit

inxi -Fxz

 System:
  Kernel: 5.15.0-76-generic x86_64 bits: 64 compiler: gcc v: 11.3.0
    Desktop: Cinnamon 5.8.4 Distro: Linux Mint 21.2 Victoria
    base: Ubuntu 22.04 jammy
Machine:
  Type: Desktop Mobo: ASUSTeK model: A8N-E v: 2.XX
    serial: <superuser required> BIOS: Phoenix v: ASUS A8N-E Revision 1013
    date: 04/07/2006
CPU:
  Info: dual core model: AMD Opteron 185 bits: 64 type: MCP arch: K8 rev.E
    rev: 2 cache: L1: 256 KiB L2: 2 MiB
  Speed (MHz): avg: 1000 min/max: 1000/2600 cores: 1: 1000 2: 1000
    bogomips: 10452
  Flags: ht lm nx pae sse sse2 sse3
Graphics:
  Device-1: NVIDIA GP107 [GeForce GTX 1050 Ti] vendor: Gigabyte
    driver: nvidia v: 535.129.03 bus-ID: 01:00.0
  Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: nvidia
    unloaded: fbdev,modesetting,nouveau,vesa gpu: nvidia
    resolution: 1920x1200~60Hz
  OpenGL: renderer: NVIDIA GeForce GTX 1050 Ti/PCIe/SSE2
    v: 4.6.0 NVIDIA 535.129.03 direct render: Yes
Audio:
  Device-1: NVIDIA GP107GL High Definition Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel bus-ID: 01:00.1
  Sound Server-1: ALSA v: k5.15.0-76-generic running: yes
  Sound Server-2: PulseAudio v: 15.99.1 running: yes
  Sound Server-3: PipeWire v: 0.3.48 running: yes
Network:
  Device-1: NVIDIA CK804 Ethernet vendor: ASUSTeK K8N4/A8N Series Mainboard
    type: network bridge driver: forcedeth v: kernel port: e000 bus-ID: 00:0a.0
  IF: enp0s10 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:
  Local Storage: total: 1.05 TiB used: 96.48 GiB (9.0%)
  ID-1: /dev/sda vendor: Samsung model: SSD 850 PRO 1TB size: 953.87 GiB
  ID-2: /dev/sdb type: USB vendor: Samsung model: Flash Drive FIT
    size: 119.51 GiB
Partition:
  ID-1: / size: 91.11 GiB used: 11.77 GiB (12.9%) fs: ext4 dev: /dev/sda5
  ID-2: /boot/efi size: 512.1 MiB used: 6.1 MiB (1.2%) fs: vfat
    dev: /dev/sda3
Swap:
  ID-1: swap-1 type: partition size: 5.59 GiB used: 0 KiB (0.0%)
    dev: /dev/sda6
Sensors:
  System Temperatures: cpu: 34.0 C mobo: 34.0 C gpu: nvidia temp: 38 C
  Fan Speeds (RPM): cpu: 1739 case-1: 1896 gpu: nvidia fan: 0%
  Power: 12v: N/A 5v: N/A 3.3v: 3.25 vbat: N/A
Info:
  Processes: 230 Uptime: 20m Memory: 3.82 GiB used: 2.32 GiB (60.7%)
  Init: systemd runlevel: 5 Compilers: gcc: 11.3.0 Packages: 2449 Shell: Bash
  v: 5.1.16 inxi: 3.3.13

nvidia-smi

Thu Dec 14 15:29:19 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1050 Ti     Off | 00000000:01:00.0  On |                  N/A |
|  0%   37C    P8              N/A / 120W |    374MiB /  4096MiB |     32%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       856      G   /usr/lib/xorg/Xorg                          135MiB |
|    0   N/A  N/A      1379      G   cinnamon                                     69MiB |
|    0   N/A  N/A      3177      G   /usr/lib/firefox/firefox                    166MiB |
+---------------------------------------------------------------------------------------+

lspci -nnk | grep nvidia

    Kernel modules: forcedeth, nvidia_drm, nvidia
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

lsmod | grep nvidia

nvidia_uvm           1511424  0
nvidia_drm             77824  5
nvidia_modeset       1302528  7 nvidia_drm
nvidia              56659968  365 nvidia_uvm,nvidia_modeset
drm_kms_helper        311296  1 nvidia_drm
drm                   622592  9 drm_kms_helper,nvidia,nvidia_drm

modinfo nvidia_modeset

filename:       /lib/modules/5.15.0-76-generic/updates/dkms/nvidia-modeset.ko
version:        535.129.03
supported:      external
license:        NVIDIA
srcversion:     7D5E21BEBB04BB93797F5C7
depends:        nvidia
retpoline:      Y
name:           nvidia_modeset
vermagic:       5.15.0-76-generic SMP mod_unload modversions 
sig_id:         PKCS#7
signer:         localhost.localdomain Secure Boot Module Signature key
sig_hashalgo:   sha512
parm:           output_rounding_fix:bool
parm:           disable_vrr_memclk_switch:bool
parm:           fail_malloc:Fail the Nth call to nvkms_alloc (int)
parm:           malloc_verbose:Report information about malloc calls on module unload (bool)
parm:           config_file:Path to the nvidia-modeset configuration file (default: disabled) (charp)

modinfo nvidia_uvm

filename:       /lib/modules/5.15.0-76-generic/updates/dkms/nvidia-uvm.ko
version:        535.129.03
supported:      external
license:        Dual MIT/GPL
srcversion:     C3E1982134A7F5485AE32F5
depends:        nvidia
retpoline:      Y
name:           nvidia_uvm
vermagic:       5.15.0-76-generic SMP mod_unload modversions 
sig_id:         PKCS#7
signer:         localhost.localdomain Secure Boot Module Signature key
sig_hashalgo:   sha512
parm:           uvm_ats_mode:Set to 0 to disable ATS (Address Translation Services). Any other value is ignored. Has no effect unless the platform supports ATS. (int)
parm:           uvm_perf_prefetch_enable:uint
parm:           uvm_perf_prefetch_threshold:uint
parm:           uvm_perf_prefetch_min_faults:uint
parm:           uvm_perf_thrashing_enable:uint
parm:           uvm_perf_thrashing_threshold:uint
parm:           uvm_perf_thrashing_pin_threshold:uint
parm:           uvm_perf_thrashing_lapse_usec:uint
parm:           uvm_perf_thrashing_nap:uint
parm:           uvm_perf_thrashing_epoch:uint
parm:           uvm_perf_thrashing_pin:uint
parm:           uvm_perf_thrashing_max_resets:uint
parm:           uvm_perf_map_remote_on_native_atomics_fault:uint
parm:           uvm_disable_hmm:Force-disable HMM functionality in the UVM driver. Default: false (HMM is enabled if possible). However, even with uvm_disable_hmm=false, HMM will not be enabled if is not supported in this driver build configuration, or if ATS settings conflict with HMM. (bool)
parm:           uvm_perf_migrate_cpu_preunmap_enable:int
parm:           uvm_perf_migrate_cpu_preunmap_block_order:uint
parm:           uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)
parm:           uvm_perf_pma_batch_nonpinned_order:uint
parm:           uvm_cpu_chunk_allocation_sizes:OR'ed value of all CPU chunk allocation sizes. (uint)
parm:           uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. (int)
parm:           uvm_force_prefetch_fault_support:uint
parm:           uvm_debug_enable_push_desc:Enable push description tracking (uint)
parm:           uvm_debug_enable_push_acquire_info:Enable push acquire information tracking (uint)
parm:           uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid, sys. (charp)
parm:           uvm_perf_access_counter_mimc_migration_enable:Whether MIMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
parm:           uvm_perf_access_counter_momc_migration_enable:Whether MOMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
parm:           uvm_perf_access_counter_batch_count:uint
parm:           uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a notification.Valid values: [1, 65535] (uint)
parm:           uvm_perf_reenable_prefetch_faults_lapse_msec:uint
parm:           uvm_perf_fault_batch_count:uint
parm:           uvm_perf_fault_replay_policy:uint
parm:           uvm_perf_fault_replay_update_put_ratio:uint
parm:           uvm_perf_fault_max_batches_per_service:uint
parm:           uvm_perf_fault_max_throttle_per_service:uint
parm:           uvm_perf_fault_coalesce:uint
parm:           uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0. (int)
parm:           uvm_perf_map_remote_on_eviction:int
parm:           uvm_exp_gpu_cache_peermem:Force caching for mappings to peer memory. This is an experimental parameter that may cause correctness issues if used. (uint)
parm:           uvm_exp_gpu_cache_sysmem:Force caching for mappings to system memory. This is an experimental parameter that may cause correctness issues if used. (uint)
parm:           uvm_downgrade_force_membar_sys:Force all TLB invalidation downgrades to use MEMBAR_SYS (uint)
parm:           uvm_channel_num_gpfifo_entries:uint
parm:           uvm_channel_gpfifo_loc:charp
parm:           uvm_channel_gpput_loc:charp
parm:           uvm_channel_pushbuffer_loc:charp
parm:           uvm_enable_va_space_mm:Set to 0 to disable UVM from using mmu_notifiers to create an association between a UVM VA space and a process. This will also disable pageable memory access via either ATS or HMM. (int)
parm:           uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)
parm:           uvm_peer_copy:Choose the addressing mode for peer copying, options: phys [default] or virt. Valid for Ampere+ GPUs. (charp)
parm:           uvm_debug_prints:Enable uvm debug prints. (int)
parm:           uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)
parm:           uvm_release_asserts:Enable uvm asserts included in release builds. (int)
parm:           uvm_release_asserts_dump_stack:dump_stack() on failed UVM release asserts. (int)
parm:           uvm_release_asserts_set_global_error:Set UVM global fatal error on failed release asserts. (int)

modinfo nvidia_drm

filename:       /lib/modules/5.15.0-76-generic/updates/dkms/nvidia-drm.ko
version:        535.129.03
supported:      external
license:        MIT
srcversion:     8B4888C7A7BA5B8AF77707F
alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm,drm_kms_helper,nvidia-modeset
retpoline:      Y
name:           nvidia_drm
vermagic:       5.15.0-76-generic SMP mod_unload modversions 
sig_id:         PKCS#7
signer:         localhost.localdomain Secure Boot Module Signature key
sig_hashalgo:   sha512
parm:           modeset:Enable atomic kernel modesetting (1 = enable, 0 = disable (default)) (bool)

cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-5.15.0-76-generic root=UUID=f617cccc-971c-437e-84c1-feb51a64b92f ro nvidia-drm.modeset=1 quiet splash

PS: My player Celluloid is working fine - hardware acceleration is working for VP9 videos.

elFarto commented 7 months ago

That is a very old PC, but I don't see any reason why it shouldn't work. An 802 error usually means there's something wrong with the driver install, but everything you've posted looks ok.

Honestly I've not found any useful way to debug these 802 issues. You might be able to get more help by posting in the NVIDIA forums saying that cuInit(0) returns an 802 error.

Wyacheslaw8 commented 6 months ago

I have found the decision. The problem is in Nvidia video card driver and in Nvidia Network Adapter in nForce4 chipset. I disabled Nvidia Network Adapter in BIOS.

inxi -Fxz

System:
  Kernel: 5.15.0-91-generic x86_64 bits: 64 compiler: gcc v: 11.4.0
    Desktop: Cinnamon 5.8.4 Distro: Linux Mint 21.2 Victoria
    base: Ubuntu 22.04 jammy
Machine:
  Type: Desktop Mobo: ASUSTeK model: A8N-E v: 2.XX
    serial: <superuser required> BIOS: Phoenix v: ASUS A8N-E Revision 1013
    date: 04/07/2006
CPU:
  Info: dual core model: AMD Opteron 185 bits: 64 type: MCP arch: K8 rev.E
    rev: 2 cache: L1: 256 KiB L2: 2 MiB
  Speed (MHz): avg: 1000 min/max: 1000/2600 cores: 1: 1000 2: 1000
    bogomips: 10453
  Flags: ht lm nx pae sse sse2 sse3
Graphics:
  Device-1: NVIDIA GP107 [GeForce GTX 1050 Ti] vendor: Gigabyte
    driver: nvidia v: 535.129.03 bus-ID: 01:00.0
  Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: nvidia
    unloaded: fbdev,modesetting,nouveau,vesa gpu: nvidia
    resolution: 1920x1200~60Hz
  OpenGL: renderer: NVIDIA GeForce GTX 1050 Ti/PCIe/SSE2
    v: 4.6.0 NVIDIA 535.129.03 direct render: Yes
Audio:
  Device-1: NVIDIA GP107GL High Definition Audio vendor: Gigabyte
    driver: snd_hda_intel v: kernel bus-ID: 01:00.1
  Sound Server-1: ALSA v: k5.15.0-91-generic running: yes
  Sound Server-2: PulseAudio v: 15.99.1 running: yes
  Sound Server-3: PipeWire v: 0.3.48 running: yes
Network:
  Message: No device data found.
  IF-ID-1: enxfcde56ff0106 state: unknown speed: -1 duplex: half
    mac: <filter>
Bluetooth:
  Device-1: Qualcomm Mobile Router type: USB driver: rndis_host v: kernel
    bus-ID: 1-4.1:3
Drives:
  Local Storage: total: 953.87 GiB used: 100.73 GiB (10.6%)
  ID-1: /dev/sda vendor: Samsung model: SSD 850 PRO 1TB size: 953.87 GiB
Partition:
  ID-1: / size: 91.11 GiB used: 36.02 GiB (39.5%) fs: ext4 dev: /dev/sda5
  ID-2: /boot/efi size: 512.1 MiB used: 6.1 MiB (1.2%) fs: vfat
    dev: /dev/sda3
Swap:
  ID-1: swap-1 type: partition size: 5.59 GiB used: 257 MiB (4.5%)
    dev: /dev/sda6
Sensors:
  System Temperatures: cpu: 35.0 C mobo: 34.0 C gpu: nvidia temp: 42 C
  Fan Speeds (RPM): cpu: 1721 case-1: 1939 gpu: nvidia fan: 0%
  Power: 12v: N/A 5v: N/A 3.3v: 3.25 vbat: N/A
Info:
  Processes: 221 Uptime: 1h 28m Memory: 3.82 GiB used: 2.31 GiB (60.4%)
  Init: systemd runlevel: 5 Compilers: gcc: 11.4.0 Packages: 2459 Shell: Bash
  v: 5.1.16 inxi: 3.3.13

And now the Nvidia-vaapi-driver is working fine in my old PC.

NVD_LOG=1 vainfo --display drm --device /dev/dri/renderD128

      5590.954726691 [10063-10063] ../src/vabackend.c:2145       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
      5590.954803793 [10063-10063] ../src/vabackend.c:2154       __vaDriverInit_1_0 Now have 0 (0 max) instances
      5590.954907154 [10063-10063] ../src/vabackend.c:2180       __vaDriverInit_1_0 Selecting Direct backend
      5590.979429497 [10063-10063] ../src/direct/nv-driver.c: 246            init_nvdriver Initing nvdriver...
      5590.979576717 [10063-10063] ../src/direct/nv-driver.c: 264            init_nvdriver NVIDIA kernel driver version: 535.129.03, major version: 535, minor version: 129
      5590.979607446 [10063-10063] ../src/direct/nv-driver.c: 271            init_nvdriver Got dev info: 100 1 0 fe
vainfo: VA-API version: 1.14 (libva 2.12.0)
vainfo: Driver version: VA-API NVDEC driver [direct backend]
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain12             : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      5591.277208264 [10063-10063] ../src/vabackend.c:2055              nvTerminate Terminating 0x5608206ea440
      5591.277302406 [10063-10063] ../src/vabackend.c:2069              nvTerminate Now have 0 (0 max) instances

Happy New Year!