flathub / com.valvesoftware.Steam

https://flathub.org/apps/details/com.valvesoftware.Steam
340 stars 69 forks source link

Steam hangs the whole system while launching after upgrade to mesa-19.2.0 #452

Closed mradermaxlol closed 5 years ago

mradermaxlol commented 5 years ago

I think this can be related to the mesa-19.2.0 upgrade, because yesterday (before the update landed in arch repos) everything was fine.

Launching Steam causes the whole system to become barely responsible => freeze, requiring a hard reboot (SysRq doesn't seem to help at all). As far as I understand, something's messing up the amdgpu driver (linux-zen 5.3.1) and eventually the system hangs. I've managed to run Steam from commandline and Ctrl-C it, and I've found lots of warnings like these in my system log:

Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: [gfxhub] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 1007 thread Xorg:cs0 pid 1009)
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0:   in page starting at address 0x0000000105700000 from 27
Sep 26 19:31:40 mashedpotato kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Sep 26 19:31:44 mashedpotato kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
nanonyme commented 5 years ago

This app does not using Mesa from the host. We don't have 19.2 yet.

nanonyme commented 5 years ago

Might be a kernel regression.

mradermaxlol commented 5 years ago

It can't be a kernel regression because I used it just fine on 5.3/5.3.1.

mradermaxlol commented 5 years ago

This is my pacman.log snippet indicating updated packages:

[2019-09-26 17:59] [ALPM] upgraded amd-ucode (20190815.07b925b-1 -> 20190923.417a9c6-1)
[2019-09-26 17:59] [ALPM] upgraded appstream (0.12.9-1 -> 0.12.9-2)
[2019-09-26 17:59] [ALPM] upgraded mesa (19.1.7-1 -> 19.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded libglvnd (1.1.1-1 -> 1.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded libwacom (1.0-1 -> 1.1-1)
[2019-09-26 17:59] [ALPM] upgraded appstream-qt (0.12.9-1 -> 0.12.9-2)
[2019-09-26 17:59] [ALPM] upgraded lib32-e2fsprogs (1.45.3-1 -> 1.45.4-1)
[2019-09-26 17:59] [ALPM] upgraded lib32-mesa (19.1.7-1 -> 19.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded lib32-libglvnd (1.1.1-1 -> 1.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded lib32-libva-mesa-driver (19.1.7-1 -> 19.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded lib32-mesa-vdpau (19.1.7-1 -> 19.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded lib32-vulkan-radeon (19.1.7-1 -> 19.2.0-1)
[2019-09-26 17:59] [ALPM] upgraded liblouis (3.10.0-1 -> 3.11.0-1)
[2019-09-26 17:59] [ALPM] upgraded libpwquality (1.4.0-2 -> 1.4.1-1)
[2019-09-26 17:59] [ALPM] upgraded libva-mesa-driver (19.1.7-1 -> 19.2.0-1)
[2019-09-26 18:00] [ALPM] upgraded linux-firmware (20190815.07b925b-1 -> 20190923.417a9c6-1)
[2019-09-26 18:00] [ALPM] upgraded mesa-vdpau (19.1.7-1 -> 19.2.0-1)
[2019-09-26 18:00] [ALPM] upgraded mobile-broadband-provider-info (20190116-1 -> 20190618-1)
[2019-09-26 18:00] [ALPM] upgraded opencl-mesa (19.1.7-1 -> 19.2.0-1)
[2019-09-26 18:00] [ALPM] upgraded pango (1:1.44.6-1 -> 1:1.44.6+2-1)
[2019-09-26 18:00] [ALPM] upgraded phonon-qt5 (4.11.0-1 -> 4.11.1-1)
[2019-09-26 18:00] [ALPM] upgraded unrar (1:5.8.1-1 -> 1:5.8.2-1)
[2019-09-26 18:00] [ALPM] upgraded vulkan-mesa-layer (19.1.7-1 -> 19.2.0-1)
[2019-09-26 18:00] [ALPM] upgraded vulkan-radeon (19.1.7-1 -> 19.2.0-1)

My best guess is that it's something with mesa or with libglvnd.

mradermaxlol commented 5 years ago

Just checked - non-flatpak Steam works just fine. Flatpak version does what's been mentioned above before any window's drawn (other than "Unpacking runtime" infobox).

nanonyme commented 5 years ago

We don't use your system Mesa or libglvnd. There's Mesa and libglvnd inside Flatpak sandbox which is talking with libdrm to your kernel. It might of course be a side-effect of your host libraries tickling your kernel in wrong way and resulting in hangs with our older Mesa which is running inside the sandbox. I was hoping the freedesktop-sdk release with newer Mesa had happened already but there's some CI slowness currently.

mradermaxlol commented 5 years ago

Yep, I know that flatpak uses its bundled mesa & libdrm. I think you're right that it's a (rather unwanted) side-effect of newer host libraries. I'll try to test Steam once the freedesktop runtime update is out; for now I'm better off using the non-flatpak version. Thanks for the support :)

nanonyme commented 5 years ago

You could potentially file a bug at https://bugs.freedesktop.org against the Mesa driver if you want support from relevant developers.

mradermaxlol commented 5 years ago

Good idea, but if the runtime update is not far away, then I'd better test against it first and submit stuff second.

nanonyme commented 5 years ago

Publishing ongoing https://gitlab.com/freedesktop-sdk/freedesktop-sdk/pipelines/84962880

nanonyme commented 5 years ago

Please test again when time allows.

mradermaxlol commented 5 years ago

Just tested it - with the updated runtime Steam seems to launch properly and it doesn't hang the system anymore.

Now I can say that this is most likely a very nasty side-effect of different mesa versions (host/runtime): I have, by accident, opened flatpak'd VSCode-OSS today (it uses 18.XX runtime IIRC) and I got a system hang in the exact same way as here. This is not good at all as those older runtimes probably aren't going to get any lib updates, and some apps are just stuck with those older runtimes. I think I'm going to report it this to freedesktop-sdk GitLab; closing the issue as Steam is fine now.

nanonyme commented 5 years ago

Release noted did mention about spirv related system hang fix so I was being kind of hopeful. It's quite possible 18.08 won't get more Mesa releases now that 19.08 is out. 19.08 has a less risky (considering ABI-stability) way of updating Mesa so this situation shouldn't reappear in one year. Make sure to lobby your favourite apps to move to 19.08.