ValveSoftware / steam-for-linux

Issue tracking for the Steam for Linux beta client
4.25k stars 175 forks source link

Exiting steam crashes X server - nvidia #9576

Closed af7567 closed 7 months ago

af7567 commented 1 year ago

sysinfo: https://gist.github.com/af7567/efcb1eb16cd7b58ad2f4c24487b34866

Since the latest or maybe previous update, the X server is crashing sometimes when I exit the steam client. I saw in a recent changelog that nvidia hardware acceleration was enabled by default so could be related, but my steam settings show hardware acceleration is turned off: screen-2023-06-10-00-07-03

The X server logs show:

[  4840.966] (EE) Backtrace:
[  4840.967] (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x139) [0x5a3679]
[  4840.968] (EE) 1: /lib64/libc.so.6 (__sigaction+0x40) [0x7fa5709f5e20]
[  4840.968] (EE) unw_get_proc_name failed: no unwind info found [-10]
[  4840.968] (EE) 2: /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so (?+0x0) [0x7fa56f0cb6c0]
[  4840.968] (EE) 
[  4840.968] (EE) Segmentation fault at address 0x10
[  4840.968] (EE) 
Fatal server error:
[  4840.968] (EE) Caught signal 11 (Segmentation fault). Server aborting
[  4840.968] (EE) 
[  4840.968] (EE) 

So it looks like something to do with the nvidia drivers, but it only happens when exiting the current steam beta.

If I start up steam and then exit straight away it won't crash, but if I start steam and click around to look at the store and community tabs before exiting then it will crash X.

kisak-valve commented 1 year ago

Hello @af7567, in general, no OpenGL application including Steam should be capable of taking down the user session. This issue should also be mentioned to your video driver vendor.

af7567 commented 1 year ago

@kisak-valve Yes, that is what I thought. I have only had the crash when exiting from recent steam beta though so I opened the issue here to see if anyone else had the same problem. Also wondering if hardware acceleration is getting enabled now even when disabled in the settings.

It could be the 2 settings I am looking at are nothing to do with the nvidia hardware acceleration that was mentioned in recent patch notes.

Big picture mode does work very smoothly now for me though :)

af7567 commented 1 year ago

I have also reported this on the nvidia forums at https://forums.developer.nvidia.com/t/x-server-nvidia-driver-crash-when-steam-beta-client-1686278536-exits-slackware64-current-nvidia-535-43-02/256252

Since steam beta 1686379854 it hasn't been as easy to reproduce the problem so I thought it might be related to "Fixed visual artifacts in the menus on Nvidia GPUs.". But it does still happen sometimes.

lostgoat commented 1 year ago

Yeah, we recently shipped a workaround for these crashes. It is possible the workaround isn't triggering for all cases.

If you get repro steps for this crash let me know. Or any info on what you were doing before the crash happened.

af7567 commented 1 year ago

@lostgoat I'm afraid I can't find any way to reproduce it reliably. It used to happen all the time before the workarounds though so much better now.

I have done some clicking around in different places before exiting and the quickest way to a crash is to start steam, wait for the friends list to show, then exit steam. This only crashes about 10% of the time or less though.

I have never been able to get it to crash when the friends list was closed so could be related to that, but also could just be a coincidence.

Here is a little video of what I did to get it to crash, the video stops when the X server dies. Running steam beta 1686630523

https://github.com/ValveSoftware/steam-for-linux/assets/12093847/61d9a4ce-84cf-4d83-bf62-07e621bc05b8

awused commented 1 year ago

I've run into this twice now. While it might be coincidence, both of the times it crashed were unusually long sessions. My Steam sessions on Linux tend to be short because you still have never fixed the idle inhibit issues on X from half a decade ago. https://github.com/ValveSoftware/steam-for-linux/issues/5607

amrit1711 commented 1 year ago

Thanks for reporting issue, we have filed a bug 4158370 for tracking purpose. We have local repro as well which will help team to debug issue. Shall keep updated with the progress.

af7567 commented 1 year ago

I have been unable to get it to crash since the 1686880776 release. The patch notes show "Fixed a rare crash when rendering an invalid texture", so maybe that is what the problem was?

af7567 commented 1 year ago

Unfortunately I had the crash again today when exiting steam. I don't think I did anything different to trigger it, but I did just update the beta to 1687306661 and the crash occurred the first time exiting since then.

edit: Just tested starting and exiting steam 10 times with the friends list open and closed Friends list open: 4/10 crashed Friends list closed: 0/10 crashed

So it feels like something in 1687306661 brought the problem back again.

awused commented 1 year ago

Yeah I ran into the crash again today as well after an update. Friends list was open. I'm on Fedora, not Slackware.

Qubitium commented 1 year ago

Ubuntu 22.04 with Nvidia 4090 + Driver: 535.86.10 + Steam Client (Beta Latest) and Steam Stream Exit still crashes the host stream client x-session.

helmchenlord commented 1 year ago

same here nvidia 3070 / driver 535.98, kernel 6.1.46, KDE-Plasma-5.27.7, steam stable client on gentoo

Qubitium commented 1 year ago

Disabling dynamic resolution toggle in the remote play settings appears to resolve this issue for me.

SandFrog commented 1 year ago

Hey, I'm running into the same issue. If I leave steam running for long periods of time (typically when I set my computer to hibernate overnight) it crashes some of the time when quitting steam. I'm running Ubuntu 23.04, with a gtx 1070 using driver version 535.113.01, Steam version 1696019606.

helmchenlord commented 1 year ago

I was able to capture a log of the crashing event. nvidia 3070 / driver 535.113.01, kernel 6.1.57, KDE-Plasma-5.27.8, steam stable client on gentoo

Okt 30 16:48:36 yme systemd[1]: Started systemd-timedated.service. Okt 30 16:48:37 yme systemd[674]: app-firefox-e088c211239048b6b6e52e7c7d9685e0.scope: Consumed 2min 19.703s CPU time. Okt 30 16:49:06 yme systemd[1]: systemd-timedated.service: Deactivated successfully. Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: XCB error: 9 (BadDrawable), sequence: 30459, resource id: 17170378, major code: 142 (DAMAGE), minor code: 1 (Create) Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 30477, resource id: 17170378, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 30480, resource id: 17170378, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 30492, resource id: 17170378, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 30495, resource id: 17170378, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: XCB error: 150 (BadDamage), sequence: 30650, resource id: 18313540, major code: 142 (DAMAGE), minor code: 3 (Subtract) Okt 30 16:49:42 yme kwin_x11[763]: kwin_core: Failed to focus 0x420002f (error 3) Okt 30 16:50:36 yme kwin_x11[763]: kwin_core: XCB error: 9 (BadDrawable), sequence: 5145, resource id: 17176007, major code: 142 (DAMAGE), minor code: 1 (Create) Okt 30 16:50:36 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 5166, resource id: 17176007, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:50:36 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 5169, resource id: 17176007, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:50:36 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 5181, resource id: 17176007, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:50:36 yme kwin_x11[763]: kwin_core: XCB error: 3 (BadWindow), sequence: 5184, resource id: 17176007, major code: 129 (SHAPE), minor code: 3 (Combine) Okt 30 16:50:36 yme kwin_x11[763]: kwin_core: XCB error: 150 (BadDamage), sequence: 5438, resource id: 18313539, major code: 142 (DAMAGE), minor code: 3 (Subtract) Okt 30 16:50:37 yme systemd[1]: Created slice system-systemd\x2dcoredump.slice. Okt 30 16:50:37 yme systemd[1]: Started systemd-coredump@0-331308-0.service. Okt 30 16:50:37 yme systemd-coredump[331310]: elfutils disabled, parsing ELF objects not supported Okt 30 16:50:37 yme systemd-coredump[331310]: Process 579 (X) of user 0 dumped core. Okt 30 16:50:37 yme systemd[1]: systemd-coredump@0-331308-0.service: Deactivated successfully. Okt 30 16:50:37 yme kernel: X (579) used greatest stack depth: 10008 bytes left Okt 30 16:50:37 yme kwin_x11[763]: X connection to :0 broken (explicit kill or server shutdown). Okt 30 16:50:37 yme kded5[796]: X connection to :0 broken (explicit kill or server shutdown). Okt 30 16:50:37 yme python3.11[1025]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme ksmserver[1128]: gkrellm: Fatal IO error 104 (Die Verbindung wurde vom Kommunikationspartner zurückgesetzt) on X server :0. Okt 30 16:50:37 yme at-spi-bus-launcher[1098]: X connection to :0 broken (explicit kill or server shutdown). Okt 30 16:50:37 yme kded5[762]: X connection to :0 broken (explicit kill or server shutdown). Okt 30 16:50:37 yme konsole[1173]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme kalendarac[1054]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme ksysguard[1166]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme pipewire[892]: mod.x11-bell: X11 I/O error handler called on display :0 Okt 30 16:50:37 yme pipewire[892]: mod.x11-bell: X11 display (:0) has encountered a fatal I/O error Okt 30 16:50:37 yme polkit-kde-authentication-agent-1[848]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme kwalletd5[688]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme kactivitymanagerd[845]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme kscreen_backend_launcher[871]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme ksmserver[760]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme kglobalaccel5[775]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme gmenudbusmenuproxy[847]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme org_kde_powerdevil[849]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme kded5[762]: The X11 connection broke: I/O error (code 1) Okt 30 16:50:37 yme kwin_x11[763]: The X11 connection broke: I/O error (code 1) Okt 30 16:50:37 yme kaccess[1049]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme xdg-desktop-portal-kde[850]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme systemd[674]: plasma-gmenudbusmenuproxy.service: Main process exited, code=exited, status=1/FAILURE Okt 30 16:50:37 yme systemd[674]: plasma-gmenudbusmenuproxy.service: Failed with result 'exit-code'. Okt 30 16:50:37 yme systemd[674]: plasma-kscreen.service: Main process exited, code=exited, status=1/FAILURE Okt 30 16:50:37 yme systemd[674]: plasma-kscreen.service: Failed with result 'exit-code'. Okt 30 16:50:37 yme systemd[674]: app-gkrellm\x2dgkrellm\x2d2-a88198187660438995efff472ad8fbb0.scope: Consumed 14min 36.094s CPU time. Okt 30 16:50:37 yme systemd[674]: Stopped target plasma-workspace-x11.target. Okt 30 16:50:37 yme systemd[674]: Stopped target plasma-workspace.target. Okt 30 16:50:37 yme systemd[674]: Stopped target xdg-desktop-autostart.target. Okt 30 16:50:37 yme systemd[674]: Stopping app-geoclue\x2ddemo\x2dagent@autostart.service... Okt 30 16:50:37 yme kdeconnectd[1035]: The X11 connection broke (error 1). Did the X11 server die? Okt 30 16:50:37 yme systemd[674]: Stopping app-hplip\x2dsystray@autostart.service... Okt 30 16:50:37 yme systemd[674]: Stopping app-kaccess@autostart.service... Okt 30 16:50:37 yme systemd[674]: Stopping app-org.kde.kalendarac@autostart.service... Okt 30 16:50:37 yme systemd[674]: Stopping app-org.kde.kdeconnect.daemon@autostart.service... Okt 30 16:50:37 yme systemd[674]: Stopping at-spi-dbus-bus.service... Okt 30 16:50:37 yme polkitd[808]: Unregistered Authentication Agent for unix-session:2 (system bus name :1.35, object path /org/kde/PolicyKit1/AuthenticationAgent, locale de_DE.utf8) (disconnected from bus) Okt 30 16:50:37 yme systemd[674]: Stopped plasma-gmenudbusmenuproxy.service. Okt 30 16:50:37 yme systemd[674]: Stopping plasma-kglobalaccel.service... Okt 30 16:50:37 yme systemd[674]: Stopping plasma-kwin_x11.service... Okt 30 16:50:37 yme systemd[674]: Stopping plasma-polkit-agent.service... Okt 30 16:50:37 yme systemd[674]: Stopping plasma-powerdevil.service... Okt 30 16:50:37 yme systemd[674]: Stopping plasma-xdg-desktop-portal-kde.service... Okt 30 16:50:37 yme systemd[674]: Stopping xdg-desktop-portal.service... Okt 30 16:50:37 yme systemd[674]: Stopping xdg-document-portal.service... Okt 30 16:50:37 yme systemd[674]: Stopping xdg-permission-store.service... Okt 30 16:50:37 yme systemd[674]: Stopped app-geoclue\x2ddemo\x2dagent@autostart.service. Okt 30 16:50:37 yme systemd[674]: Stopped plasma-kglobalaccel.service. Okt 30 16:50:37 yme systemd[674]: Stopped xdg-permission-store.service. Okt 30 16:50:37 yme systemd[674]: plasma-kactivitymanagerd.service: Main process exited, code=exited, status=1/FAILURE Okt 30 16:50:37 yme systemd[674]: plasma-kactivitymanagerd.service: Failed with result 'exit-code'. Okt 30 16:50:37 yme systemd[674]: Stopped plasma-kactivitymanagerd.service. Okt 30 16:50:37 yme systemd[674]: plasma-kactivitymanagerd.service: Consumed 1.425s CPU time. Okt 30 16:50:37 yme systemd[674]: Stopped xdg-desktop-portal.service. Okt 30 16:50:37 yme systemd[1]: run-user-1000-doc.mount: Deactivated successfully. Okt 30 16:50:37 yme systemd[674]: plasma-powerdevil.service: Main process exited, code=exited, status=1/FAILURE Okt 30 16:50:37 yme systemd[674]: plasma-powerdevil.service: Failed with result 'exit-code'. Okt 30 16:50:37 yme systemd[674]: Stopped plasma-powerdevil.service. Okt 30 16:50:37 yme systemd[674]: plasma-powerdevil.service: Consumed 1.189s CPU time. Okt 30 16:50:37 yme systemd[674]: app-pavucontrol-caeb5647141f4c5ea428bbd53ba97928.scope: Consumed 3min 11.591s CPU time. Okt 30 16:50:37 yme systemd[674]: Stopped at-spi-dbus-bus.service. Okt 30 16:50:37 yme systemd[674]: at-spi-dbus-bus.service: Consumed 3.366s CPU time. Okt 30 16:50:37 yme systemd[674]: Stopped xdg-document-portal.service. Okt 30 16:50:37 yme systemd[674]: Stopped plasma-polkit-agent.service. Okt 30 16:50:37 yme startplasma-x11[689]: org.kde.startup: "kdeinit5_shutdown" () exited with code 255

ghost commented 10 months ago

Same issue here.

croco3008 commented 10 months ago

I have same problem. Crash is almost granted when computer resumes from sleep and steam client is left open. Nvidia drivers 535.129.03., Hybrid graphics, intel/RTX A1000.

amrit1711 commented 10 months ago

I ran few cycles of rtcwake to suspend and wake the System automatically on hybrid graphics with RTX 2060 and driver 535.129.03 but I did not observe any crash while exiting steam client version 1702079146. Could you please update steam client and see if it fixes the issue. If not, please help to share repro frequency and exact repro steps along with nvidia bug report from repro state.

croco3008 commented 10 months ago

Very strange. I couldn't repeat the issue too. Maybe some system update has fixed it. Ubuntu 22.04.

I ran few cycles of rtcwake to suspend and wake the System automatically on hybrid graphics with RTX 2060 and driver 535.129.03 but I did not observe any crash while exiting steam client version 1702079146. Could you please update steam client and see if it fixes the issue. If not, please help to share repro frequency and exact repro steps along with nvidia bug report from repro state.

ri0t commented 8 months ago

Having this very bad and annoying occasional issue right now with Steam stable on: NixOS: 23.05 nvidia driver: 535.86.05 steam: 1709920887

Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) Backtrace: Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) 0: /nix/store/d83iv8ylgzgkrvjl2z20rbdvvjv040hs-xorg-server-21.1.9/bin/X (OsSigHandler+0x2> Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) 1: /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libc.so.6 (__sigaction+0> Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) unw_get_proc_name failed: no unwind info found [-10] Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) 2: /nix/store/ll5dy9pid4l07h5apw2480g7wk0a75m6-nvidia-x11-535.86.05-6.1.62-bin/lib/xorg/m> Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) Mar 11 15:48:26 vortex xserver-wrapper[967943]: (EE) Segmentation fault at address 0x10

lostgoat commented 8 months ago

@ri0t steam introduced a workaround for this issue a while ago, it didn't fully fix it but it made the issue less likely to occur.

Nvidia released a proper fix for this issue in the 545 series drivers (or anything newer).

af7567 commented 8 months ago

Nvidia released a proper fix for this issue in the 545 series drivers (or anything newer).

That explains why I couldn't get it to crash earlier when I tried :) I am using nvidia 550 now.

I have got used to making sure I close my friends list before closing steam, this prevented the crash for me with the older nvidia drivers.

kisak-valve commented 7 months ago

Closing per the last comment.