elFarto / nvidia-vaapi-driver

A VA-API implemention using NVIDIA's NVDEC
Other
1.17k stars 53 forks source link

this driver breaks hibernation #236

Open fishxz opened 1 year ago

fishxz commented 1 year ago

hey, as soon i enable this driver my fedora fails to hibernate.

Aug 23 12:46:38 fish logger[15383]: <13>Aug 23 12:46:38 suspend: nvidia-suspend.service
Aug 23 12:46:38 fish kernel: rfkill: input handler enabled
Aug 23 12:46:38 fish gsd-media-keys[1452]: gvc_mixer_card_get_index: assertion 'GVC_IS_MIXER_CARD (card)' failed
Aug 23 12:46:38 fish gsd-media-keys[1452]: Unable to get default source
Aug 23 12:46:38 fish gsd-media-keys[1452]: Unable to get default sink
Aug 23 12:46:38 fish gsd-media-keys[1452]: gvc_mixer_card_get_index: assertion 'GVC_IS_MIXER_CARD (card)' failed
Aug 23 12:46:38 fish kernel: list_add corruption. prev is NULL.
Aug 23 12:46:38 fish kernel: ------------[ cut here ]------------
Aug 23 12:46:38 fish kernel: kernel BUG at lib/list_debug.c:23!
Aug 23 12:46:38 fish kernel: fbcon: Taking over console
Aug 23 12:46:38 fish kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Aug 23 12:46:38 fish kernel: CPU: 5 PID: 15384 Comm: nvidia-sleep.sh Tainted: P           OE      6.4.11-200.fc38.x86_64 #1
Aug 23 12:46:38 fish kernel: Hardware name: Gigabyte Technology Co., Ltd. H97M-D3H/H97M-D3H, BIOS F8b 03/03/2016
Aug 23 12:46:38 fish kernel: RIP: 0010:__list_add_valid+0x42/0xa0
Aug 23 12:46:38 fish kernel: Code: 75 38 4c 8b 02 49 39 c0 75 41 48 39 d7 74 53 4c 39 c7 74 4e b8 01 00 00 00 c3 cc cc cc cc 48 c7 c7 f8 3d 93 b7 e8 1e 44 9a ff <0f> 0b 48 c7 c7 20 3e 93 b7 e8 10 44 9a ff 0f 0b 48 89 c1 48 c7 c7
Aug 23 12:46:38 fish kernel: RSP: 0018:ffffb21b0ed33b78 EFLAGS: 00010046
Aug 23 12:46:38 fish kernel: RAX: 0000000000000022 RBX: ffffb21b0b6d92a8 RCX: 0000000000000000
Aug 23 12:46:38 fish kernel: RDX: 0000000000000000 RSI: ffffa079cdd61540 RDI: ffffa079cdd61540
Aug 23 12:46:38 fish kernel: RBP: ffffb21b0ed33bb0 R08: 0000000000000000 R09: ffffb21b0ed33a20
Aug 23 12:46:38 fish kernel: R10: 0000000000000003 R11: ffffffffb8146508 R12: 0000000000000246
Aug 23 12:46:38 fish kernel: R13: ffffb21b0b6d92b8 R14: 0000000000000000 R15: ffffa0794cb60000
Aug 23 12:46:38 fish kernel: FS:  00007fc763905740(0000) GS:ffffa079cdd40000(0000) knlGS:0000000000000000
Aug 23 12:46:38 fish kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 23 12:46:38 fish kernel: CR2: 000056094276a278 CR3: 00000003569b8004 CR4: 00000000001706e0
Aug 23 12:46:38 fish kernel: Call Trace:
Aug 23 12:46:38 fish kernel:  <TASK>
Aug 23 12:46:38 fish kernel:  ? die+0x36/0x90
Aug 23 12:46:38 fish kernel:  ? do_trap+0xda/0x100
Aug 23 12:46:38 fish kernel:  ? __list_add_valid+0x42/0xa0
Aug 23 12:46:38 fish kernel:  ? do_error_trap+0x6a/0x90
Aug 23 12:46:38 fish kernel:  ? __list_add_valid+0x42/0xa0
Aug 23 12:46:38 fish kernel:  ? exc_invalid_op+0x50/0x70
Aug 23 12:46:38 fish kernel:  ? __list_add_valid+0x42/0xa0
Aug 23 12:46:38 fish kernel:  ? asm_exc_invalid_op+0x1a/0x20
Aug 23 12:46:38 fish kernel:  ? __list_add_valid+0x42/0xa0
Aug 23 12:46:38 fish kernel:  ? __list_add_valid+0x42/0xa0
Aug 23 12:46:38 fish kernel:  _raw_q_schedule+0x3d/0xa0 [nvidia_uvm]
Aug 23 12:46:38 fish kernel:  nv_kthread_q_flush+0x7b/0x140 [nvidia_uvm]
Aug 23 12:46:38 fish kernel:  ? __pfx__q_flush_function+0x10/0x10 [nvidia_uvm]
Aug 23 12:46:38 fish kernel:  uvm_suspend+0x9f/0x190 [nvidia_uvm]
Aug 23 12:46:38 fish kernel:  uvm_suspend_entry.part.0+0x4e/0xa0 [nvidia_uvm]
Aug 23 12:46:38 fish kernel:  nv_uvm_suspend+0x2e/0x50 [nvidia]
Aug 23 12:46:38 fish kernel:  nv_set_system_power_state+0x3bb/0x470 [nvidia]
Aug 23 12:46:38 fish kernel:  nv_procfs_write_suspend+0xe8/0x160 [nvidia]
Aug 23 12:46:38 fish kernel:  proc_reg_write+0x57/0xa0
Aug 23 12:46:38 fish kernel:  vfs_write+0xe5/0x3f0
Aug 23 12:46:38 fish kernel:  ? __handle_mm_fault+0xbb4/0xc90
Aug 23 12:46:38 fish kernel:  ksys_write+0x6f/0xf0
Aug 23 12:46:38 fish kernel:  do_syscall_64+0x5d/0x90
Aug 23 12:46:38 fish kernel:  ? __count_memcg_events+0x42/0x90
Aug 23 12:46:38 fish kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Aug 23 12:46:38 fish kernel:  ? handle_mm_fault+0x9e/0x350
Aug 23 12:46:39 fish kernel:  ? do_user_addr_fault+0x23a/0x600
Aug 23 12:46:39 fish kernel:  ? exc_page_fault+0x7f/0x180
Aug 23 12:46:39 fish kernel:  entry_SYSCALL_64_after_hwframe+0x77/0xe1
Aug 23 12:46:39 fish kernel: RIP: 0033:0x7fc763a09164
Aug 23 12:46:39 fish kernel: Code: 89 02 48 c7 c0 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 7d b4 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
Aug 23 12:46:39 fish kernel: RSP: 002b:00007ffc30afeef8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Aug 23 12:46:39 fish kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007fc763a09164
Aug 23 12:46:39 fish kernel: RDX: 0000000000000008 RSI: 0000560942769e70 RDI: 0000000000000001
Aug 23 12:46:39 fish kernel: RBP: 00007ffc30afef20 R08: 0000000000000410 R09: 0000000000000001
Aug 23 12:46:39 fish kernel: R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000008
Aug 23 12:46:39 fish kernel: R13: 0000560942769e70 R14: 00007fc763add780 R15: 0000000000000008
Aug 23 12:46:39 fish kernel:  </TASK>
Aug 23 12:46:39 fish kernel: Modules linked in: uinput snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_c>
Aug 23 12:46:39 fish kernel:  parport snd_timer snd joydev soundcore acpi_pad loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 video wmi ip6_tables ip_tables fuse
Aug 23 12:46:39 fish kernel: ---[ end trace 0000000000000000 ]---
Aug 23 12:46:39 fish kernel: RIP: 0010:__list_add_valid+0x42/0xa0
Aug 23 12:46:39 fish kernel: Code: 75 38 4c 8b 02 49 39 c0 75 41 48 39 d7 74 53 4c 39 c7 74 4e b8 01 00 00 00 c3 cc cc cc cc 48 c7 c7 f8 3d 93 b7 e8 1e 44 9a ff <0f> 0b 48 c7 c7 20 3e 93 b7 e8 10 44 9a ff 0f 0b 48 89 c1 48 c7 c7
Aug 23 12:46:39 fish kernel: RSP: 0018:ffffb21b0ed33b78 EFLAGS: 00010046
Aug 23 12:46:39 fish kernel: RAX: 0000000000000022 RBX: ffffb21b0b6d92a8 RCX: 0000000000000000
Aug 23 12:46:39 fish kernel: RDX: 0000000000000000 RSI: ffffa079cdd61540 RDI: ffffa079cdd61540
Aug 23 12:46:39 fish kernel: RBP: ffffb21b0ed33bb0 R08: 0000000000000000 R09: ffffb21b0ed33a20
Aug 23 12:46:39 fish kernel: R10: 0000000000000003 R11: ffffffffb8146508 R12: 0000000000000246
Aug 23 12:46:39 fish kernel: R13: ffffb21b0b6d92b8 R14: 0000000000000000 R15: ffffa0794cb60000
Aug 23 12:46:39 fish kernel: FS:  00007fc763905740(0000) GS:ffffa079cdd40000(0000) knlGS:0000000000000000
Aug 23 12:46:39 fish kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 23 12:46:39 fish kernel: CR2: 000056094276a278 CR3: 00000003569b8004 CR4: 00000000001706e0
Aug 23 12:46:39 fish kernel: note: nvidia-sleep.sh[15384] exited with irqs disabled
Aug 23 12:46:39 fish kernel: note: nvidia-sleep.sh[15384] exited with preempt_count 1
Aug 23 12:46:39 fish kernel: Console: switching to colour frame buffer device 128x48
Aug 23 12:46:39 fish kernel: PM: suspend entry (deep)
Aug 23 12:46:38 fish audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=nvidia-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Aug 23 12:46:38 fish systemd[1]: nvidia-suspend.service: Main process exited, code=killed, status=11/SEGV
Aug 23 12:46:38 fish systemd[1]: nvidia-suspend.service: Failed with result 'signal'.
Aug 23 12:46:38 fish systemd[1]: Failed to start nvidia-suspend.service - NVIDIA system suspend actions.
Aug 23 12:46:38 fish systemd[1]: Starting systemd-suspend.service - System Suspend...
Aug 23 12:46:38 fish systemd-sleep[15397]: Entering sleep state 'suspend'...
Aug 23 12:46:39 fish kernel: Filesystems sync: 0.050 seconds
Aug 23 12:46:39 fish kernel: Freezing user space processes
Aug 23 12:46:39 fish kernel: Freezing user space processes completed (elapsed 0.001 seconds)
Aug 23 12:46:39 fish kernel: OOM killer disabled.
Aug 23 12:46:39 fish kernel: Freezing remaining freezable tasks
Aug 23 12:46:39 fish kernel: Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
Aug 23 12:46:39 fish kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Aug 23 12:46:39 fish kernel: parport_pc 00:06: disabled
Aug 23 12:46:39 fish kernel: serial 00:05: disabled
Aug 23 12:46:39 fish kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Aug 23 12:46:39 fish kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Sup>
Aug 23 12:46:39 fish kernel: sd 0:0:0:0: [sda] Stopping disk
Aug 23 12:46:39 fish kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
Aug 23 12:46:39 fish kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
Aug 23 12:46:39 fish kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
Aug 23 12:46:39 fish kernel: PM: Some devices failed to suspend, or early wake event detected

settings i used: env: MOZ_DISABLE_RDD_SANDBOX=1 LIBVA_DRIVERS_PATH=/var/home/fish/.var/app/org.mozilla.firefox/dri/

firefox: media.ffmpeg.vaapi.enabled=true widget.dmabuf.force-enabled=true

os: fedora 38 silverblue driver version 535.98 gpu: gtx 970

ElvenEleven11 commented 1 year ago

By the way, every time I suspend my Ubuntu 22.04.3 driver fails, after wakeup vainfo says:

libva info: VA-API version 1.14.0
libva info: User environment variable requested driver 'nvidia'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so
libva info: Found init function __vaDriverInit_1_0
libva error: /usr/lib/x86_64-linux-gnu/dri/nvidia_drv_video.so init failed
libva info: va_openDriver() returns 1
vaInitialize failed with error code 1 (operation failed),exit

Reboot helps.

rkoot commented 1 year ago

I notice this line: Aug 23 12:46:39 fish kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Sup Maybe that's biting you?

elFarto commented 11 months ago

Sorry, I completely missed this issue.

I'm not entirely sure how this driver would cause a hibernate error, unless maybe your playing a video while attempting to hibernate. Video decoding on NVIDIA doesn't like surviving a sleep/hibernate cycle as it loses all the video memory allocations and has no way to recover them.

You could try the PreserveVideoMemoryAllocations option as rkoot suggested.

mirh commented 7 months ago

This the same of #253