ValveSoftware / gamescope

SteamOS session compositing window manager
Other
3.13k stars 211 forks source link

AMDGPU driver crash when using gamescope + steam remote play + VAAPI #779

Open stephensrmmartin opened 1 year ago

stephensrmmartin commented 1 year ago

This is a very niche issue and took some time to narrow down.

Kernel: 6.1.9-zen1-2-zen Mesa: 22.3.4-1 Gamescope: 3.11.51 GPU: AMD 6700xt DE: Plasma/KDE Window protocol: Wayland (Plasma Wayland Session)

Ok! With that out of the way. If I: 1) Launch Steam in gamescope: gamescope -e -w 1400 -h 900 -W 1400 -H 900 -- steam -pipewire -tenfoot -nointro or gamescope -e -w 1400 -h 900 -W 1400 -H 900 -- steam -pipewire -pipewire-dmabuf -tenfoot -nointro

2) Have AMD HW acceleration enabled in Steam (VAAPI)

3) Connect using steam remote play from any device

Then:

4) There is a AMD driver page fault, which renders the system unusable. The driver crashes, the card resets, and all monitor output ceases to function. The system is still SSH'able, but cannot be rebooted due to some hang. Instead, it has to be rebooted with a REISUB hard reboot.

The dmesg:

Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32787, for process steam pid 3378 thread steam:cs0 pid 3412)
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800100cfc000 from client 0x12 (VMC)
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: VCN0 (0x2b)
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x1
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32787, for process steam pid 3378 thread steam:cs0 pid 3412)
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:   in page starting at address 0x0000800100cfb000 from client 0x12 (VMC)
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          Faulty UTCL2 client ID: unknown (0x0)
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          MORE_FAULTS: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          WALKER_ERROR: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          PERMISSION_FAULTS: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          MAPPING_ERROR: 0x0
Feb 09 11:02:37 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu:          RW: 0x0
Feb 09 11:02:47 hwkiller-desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_enc_0.0 timeout, signaled seq=191, emitted seq=192
Feb 09 11:02:47 hwkiller-desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process steam pid 3378 thread steam:cs0 pid 3412
Feb 09 11:02:47 hwkiller-desktop kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
Feb 09 11:02:47 hwkiller-desktop kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002

Note that this only happens when 1) VAAPI is enabled (when disabled, remote play is fine) AND 2) Gamescope is used (When streaming with VAAPI is enabled, and not using gamescope, then remote play "works" [well, there's the black screen issue for bpm, but that is not new]).

sajeeshsidharthan commented 1 year ago

Can you check whether below patch resolves the issue?

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21823

MatthiasGrandl commented 1 year ago

For me VAAPI remote play works fine on mesa-git (-pipewire and -pipewire-dmabuf is not needed).

It only works in nested sessions however. In embedded sessions gamescope/steam crashes as soon as I start remote play. This is regardless of me using software or hardware acceleration. This is probably completely unrelated to this ticket though. Just found this while search my issue.

Saroumane commented 1 year ago

I think I can reproduce the issue.

Host :

Client 1 : Steam link app on Nvidia Shield, streaming Big Picture Mode interface => No problem

Client 2 : Steamdeck on Stable Channel (which uses gamescope ?), streaming a single game. GPU crash (on host) with :

kernel: amdgpu 0000:28:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:0 vmid:3 pasid:32777, for process steam pid 7247 thread steam:cs0 pid 7325)
kernel: amdgpu 0000:28:00.0: amdgpu:   in page starting at address 0x000080010055b000 from client 0x12 (VMC)
kernel: amdgpu 0000:28:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00302431
kernel: amdgpu 0000:28:00.0: amdgpu:          Faulty UTCL2 client ID: VCN (0x12)
kernel: amdgpu 0000:28:00.0: amdgpu:          MORE_FAULTS: 0x1
kernel: amdgpu 0000:28:00.0: amdgpu:          WALKER_ERROR: 0x0
kernel: amdgpu 0000:28:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
kernel: amdgpu 0000:28:00.0: amdgpu:          MAPPING_ERROR: 0x0
kernel: amdgpu 0000:28:00.0: amdgpu:          RW: 0x0
Saroumane commented 1 year ago

Well, I certainly do reproduce the crash with Steam deck as client and "Enable hardware encoding on AMD GPU" on the Linux Wayland host. (and I won't again because it's too dangerous, the GPU did not autorecover/reset the 2nd time...)

Interestingly, even if I have on the host a AMD GPU card and no Intel iGPU, I can (for now, no crash) use the Steam deck as client and have VAAPI H264 activated. The trick is to check "Enable hardware encoding on Intel iGPU" on the host. I guess that my OS/GPU implements the 2 backends (AMD and Intel).

Meanwhile, the safest bet for me is to stream the whole Big Picture Mode with the Steam Link App (and AMD hardware encoding), instead of a single game with the Steam deck.

Samsagax commented 1 year ago

Came across this today and the effect is scary indeed. The system locks up.

I can reproduce the issue consistently (even with zen kernel 6.4.1) after using a patch to fix gamescope streaming.

Samsagax commented 1 year ago

New findings:

I speculate it is related to the way the client exposes the stream in different situations. Might be worth reporting to upstream amd kernel devs.

Samsagax commented 1 year ago

Reported here: https://gitlab.freedesktop.org/drm/amd/-/issues/2681

DistantThunder commented 8 months ago

Hello,

I can now launch a Stream in these conditions:

But now my problem is that VAAPI encode doesn't seem to work. The output is garbled colorful waste on the client and on the origin, I have the following repeating multiple times:

EE ../mesa-24.0.2/src/gallium/drivers/radeonsi/radeon_vcn_enc_1_2.c:1224 radeon_enc_encode_params UVD - DCC surfaces not supported.

Could this be something that could be worked out in Gamescope? I'm using it to upscale the game to 4k to be able to stream it to a 4K client while my own monitor is only 2K.

Additional infos:

vainfo: Driver version: Mesa Gallium driver 24.0.2-arch1.1 for AMD Radeon RX 7900 XTX (radeonsi, navi31, LLVM 16.0.6, DRM 3.57, 6.7.6-zen1-2-zen) VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice VAProfileH264Main : VAEntrypointEncSlice VAProfileH264High : VAEntrypointEncSlice VAProfileHEVCMain : VAEntrypointEncSlice VAProfileHEVCMain10 : VAEntrypointEncSlice VAProfileAV1Profile0 : VAEntrypointEncSlice

For some reason, Steam propose HEVC on the client but doesn't seem capable of using it on the Host? Is it related to Gamescope?

kisak-valve commented 8 months ago

Hello @DistantThunder, as far as I know, DCC should be a driver internal detail. You should mention the encoding corruption to your video driver vendor.

DistantThunder commented 8 months ago

Hello, @kisak-valve thanks for your dedication again as this is weekend.

I'd tend to agree with you if it wasn't for the fact that VAAPI encode works correctly when gamescope is not involved (again, at game level, not Steam, when launching Steam in a gamescope 4k session, it just crashes when attempting to stream).

I did try with RADV_DEBUG=nodcc but to no avail.