NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.99k stars 14.01k forks source link

Pantheon session crash after lock or suspend with multi-monitors #60082

Closed davidak closed 4 years ago

davidak commented 5 years ago

Issue description

I have this problem on a Lenovo Thinkpad L470, but not on a Thinkpad X230 or custom gaming PC. I had issues with X11 sessions before on this device, see https://github.com/NixOS/nixpkgs/issues/44344.

See the log full of errors: https://gist.github.com/davidak/d2de7f9ef515755db76f3b61b102b664

And video: https://peertube.social/videos/watch/2c9d8be7-3adb-48ae-a874-799fd9804dd2

cc @worldofpeace

Steps to reproduce

  1. lock or suspend session
  2. when locked, the screen get on again after 2 seconds
  3. you login again, screen get black shortly and login screen is shown again
  4. i can't type in password field, only click on "login" without password. it is wrong obviously
  5. then i can type in a password again and get a new session. my former session is crashed

Technical details

worldofpeace commented 5 years ago

Steps 3-5 look exactly how a crash in a session would look like, also assuming you're using the default gtk LightDM greeter.

But these logs look like X died took everything with it. Can you get a stack trace from coredumpctl or maybe something from the Xorg logs?

davidak commented 5 years ago

assuming you're using the default gtk LightDM greeter.

correct

Can you get a stack trace from coredumpctl

i try to get one

or maybe something from the Xorg logs?

journalctl -u display-manager.service
...
-- Reboot --
Apr 23 11:18:46 ethmoid systemd[1]: Starting X11 Server...
Apr 23 11:18:46 ethmoid systemd[1]: Started X11 Server.
Apr 23 11:18:48 ethmoid lightdm[1354]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:18:55 ethmoid lightdm[1393]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
Apr 23 11:27:02 ethmoid lightdm[2863]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:30:42 ethmoid lightdm[2992]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:31:54 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 11:31:54 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 11:32:01 ethmoid lightdm[3036]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
Apr 23 11:32:35 ethmoid lightdm[4062]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:32:47 ethmoid lightdm[4188]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:33:11 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 11:33:11 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 11:33:16 ethmoid lightdm[4232]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
Apr 23 11:35:46 ethmoid lightdm[4965]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:35:56 ethmoid lightdm[5117]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:36:05 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 11:36:05 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 11:36:09 ethmoid lightdm[5143]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
Apr 23 11:37:11 ethmoid lightdm[5630]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:37:27 ethmoid lightdm[5712]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 11:37:39 ethmoid lightdm[5756]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
Apr 23 12:07:17 ethmoid lightdm[8872]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 12:07:31 ethmoid lightdm[8964]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 12:07:54 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 12:07:54 ethmoid lightdm[1127]: session_get_is_authenticated: assertion 'session != NULL' failed
Apr 23 12:08:05 ethmoid lightdm[8990]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
Apr 23 13:02:29 ethmoid lightdm[10693]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 13:02:37 ethmoid lightdm[10849]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
Apr 23 13:02:56 ethmoid lightdm[10875]: pam_unix(lightdm:session): session opened for user davidak by (uid=0)
davidak commented 5 years ago

I tried to enable core dumps, but the documentation is not clear how to do. https://github.com/NixOS/nixpkgs/issues/60088

I don't get coredumps after applying the changes and rebooting.

But when locked, the system don't reacts on keaboard input, mouse or touchpad interaction. Only clicking the power button let the device shut down.

Suspending still let the session crash. But no core dump.

worldofpeace commented 5 years ago

But when locked, the system don't reacts on keaboard input, mouse or touchpad interaction. Only clicking the power button let the device shut down.

Have you tried switching to a tty with the shortcut?

Also, when having issues with locking and suspending, it could be two independent issues so I'd need to know

worldofpeace commented 5 years ago

Edit: saw the video, (felt the frustration :smile:) and going with debugging locking specifically where it's locked with the shortcut and no keyboard input needed to wake (by itself)

That's a multi-monitor setup if I'm not mistaken?

davidak commented 5 years ago

That's a multi-monitor setup if I'm not mistaken?

yes. it's a thinkpad on a docking station with 2 external displays.

this strange thinkpad has also a dedicated radeon GPU, but it has worse performance than the integrated intel gpu when open source drivers are used, so it's ok it is not used. but that might also cause trouble...

Apr 23 16:30:49 ethmoid kernel: [drm] ring test on 0 succeeded in 1 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ring test on 1 succeeded in 1 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ring test on 2 succeeded in 1 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ring test on 3 succeeded in 4 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ring test on 4 succeeded in 4 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ib test on ring 1 succeeded in 0 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ib test on ring 2 succeeded in 0 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ib test on ring 3 succeeded in 0 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] ib test on ring 4 succeeded in 0 usecs
Apr 23 16:30:49 ethmoid kernel: [drm] Radeon Display Connectors
Apr 23 16:30:49 ethmoid kernel: iwlwifi 0000:05:00.0 wlp5s0: renamed from wlan0
Apr 23 16:30:49 ethmoid kernel: [drm] Initialized radeon 2.50.0 20080528 for 0000:02:00.0 on minor 1
Apr 23 16:30:50 ethmoid kernel: [drm] amdgpu kernel modesetting enabled.
Apr 23 16:30:50 ethmoid kernel: vga_switcheroo: detected switching method \_SB_.PCI0.GFX0.ATPX handle
Apr 23 16:30:50 ethmoid kernel: ATPX version 1, functions 0x00000033
Apr 23 16:30:50 ethmoid kernel: ATPX Hybrid Graphics

Have you tried switching to a tty with the shortcut?

that works. then i can also go back and login again, to a new session.

how exactly a session is locked

shortcut or menu in upper right

how it was suspended

menu

how it's unlocked

only way is to go to a tty and back to tty 7, then you can login

how you resumed

when hit ENTER, the login screen appears. again login 3 times to new session. moving and clicking mouse does nothing

worldofpeace commented 5 years ago

:sparkles: I can reproduce steps 3-5 with a multi-monitor setup.

I tested that with a laptop with Intel UHD Graphics 620. Strangely enough though, if you use the pantheon greeter you cannot unlock a session with multi-monitors. Which in a way prevents the session from being killed altogether.

davidak commented 5 years ago

I have a Intel Corporation HD Graphics 620 (rev 02) too.

davidak commented 5 years ago

I can confirm that unlock and resume works without external displays.

When trying to unlock, the display still remains dark. I have to go to a tty and back to tty 7, multiple times.

Same when bootet without external displays.

Now i saw the "This session is locked" when i go back to tty 7, some seconds later i get the login screen.

worldofpeace commented 5 years ago

I can confirm that unlock and resume works without external displays.

When trying to unlock, the display still remains dark. I have to go to a tty and back to tty 7, multiple times.

Same when bootet without external displays.

Now i saw the "This session is locked" when i go back to tty 7, some seconds later i get the login screen.

That sounds exactly like https://github.com/the-cavalry/light-locker/issues/138 so an issue with light-locker (probably a regression from GNOME 3.30?).

Edit: sounds like light-locker can't switch back to the greeter properly

davidak commented 5 years ago

so an issue with light-locker (probably a regression from GNOME 3.30?).

in gnome on NixOS 18.09 i had no issues with locking (with GDM! with lightdm locking probably still don't work). but with Xfce earlier, that's why i switched to gnome... https://github.com/NixOS/nixpkgs/issues/38493 (Xfce uses xflock4)

That sounds exactly like the-cavalry/light-locker#138

yes, except

I get a message that the screen is locked and I get redirected to unlock in a few seconds but that never happens

i got the login screen in about 2 seconds

davidak commented 5 years ago

I can now say that there are no core dumps.

Did you have some success with debugging? Can i help in some way?

worldofpeace commented 5 years ago

I'm looking for time to debug, as it's going to be interesting and involved. From there I'm suspecting that we'll need the co-operation of upstream to actually fix the issue.

And you're and have been very helpful btw
davidak commented 5 years ago

Next step is to get a backtrace with gdb of the gnome-session-binary --session=pantheon process.

https://wiki.gnome.org/Projects/GnomeShell/Debugging

davidak commented 5 years ago

I tried Obtaining a stack and JS trace using GDB for an already running gnome-shell

sudo gdb -p $(pgrep -U $USER gnome-session) -batch   -ex "set logging on" -ex continue   -ex "bt full" -ex "call gjs_dumpstack()"   -ex quit

but i only got:

[davidak@ethmoid:~]$ cat gdb.txt 
The program is not being run.
No stack.
No symbol table is loaded.  Use the "file" command.
The program is not being run.
No stack.
No symbol table is loaded.  Use the "file" command.
The program is not being run.
No stack.
No symbol table is loaded.  Use the "file" command.
The program is not being run.
No stack.
No symbol table is loaded.  Use the "file" command.
[Inferior 1 (process 3422) detached]
Exception ignored in: <gdb.GdbOutputFile object at 0x7fb7aa883630>
Traceback (most recent call last):
  File "/nix/store/czlqn7qg45hxxmdy5s4sk22ggvb13pys-gdb-8.2.1/share/gdb/python/gdb/__init__.py", line 43, in flush
    def flush(self):
KeyboardInterrupt

(I locked the screen, tried to unlock but screen was dark no matter what i did. after hitting the power button the system shut down)

davidak commented 5 years ago

Starting GNOME Shell under gdb

ssh into desktop with same user
nix run nixpkgs.gdb
vim xenv.sh

gnome_session=$(pgrep -u $USER gnome-session)
eval export $(sed 's/\o000/\n/g;' < /proc/$gnome_session/environ | grep DISPLAY)
eval export $(sed 's/\o000/\n/g;' < /proc/$gnome_session/environ | grep XAUTHORITY)
eval export $(sed 's/\o000/\n/g;' < /proc/$gnome_session/environ | grep DBUS_SESSION_BUS_ADDRESS)

. xenv.sh
gdb -p $(pidof gnome-session-binary)
on desktop computer, lock screen
try to get login screen by hitting ENTER and click mouse

(gdb) t a a bt

Thread 4 (Thread 0x7fb364c7e700 (LWP 2217)):
#0  0x00007fb368346501 in poll () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#1  0x00007fb3685b92de in g_main_context_iterate.isra () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#2  0x00007fb3685b95ac in g_main_context_iteration () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#3  0x00007fb366952d2d in dconf_gdbus_worker_thread () from /nix/store/b1g88az9msk5jnq64zyy415kcwyv3bjm-dconf-0.30.1-lib/lib/gio/modules/libdconfsettings.so
#4  0x00007fb3685ec295 in g_thread_proxy () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#5  0x00007fb368176ef7 in start_thread () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libpthread.so.0
#6  0x00007fb36835022f in clone () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6

Thread 3 (Thread 0x7fb365cd0700 (LWP 2197)):
#0  0x00007fb368346501 in poll () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#1  0x00007fb3685b92de in g_main_context_iterate.isra () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#2  0x00007fb3685b98e2 in g_main_loop_run () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#3  0x00007fb36880e3f6 in gdbus_shared_thread_func () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libgio-2.0.so.0
#4  0x00007fb3685ec295 in g_thread_proxy () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#5  0x00007fb368176ef7 in start_thread () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libpthread.so.0
#6  0x00007fb36835022f in clone () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6

Thread 2 (Thread 0x7fb3664d1700 (LWP 2196)):
#0  0x00007fb368346501 in poll () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#1  0x00007fb3685b92de in g_main_context_iterate.isra () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#2  0x00007fb3685b95ac in g_main_context_iteration () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#3  0x00007fb3685b95f1 in glib_worker_main () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#4  0x00007fb3685ec295 in g_thread_proxy () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#5  0x00007fb368176ef7 in start_thread () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libpthread.so.0
#6  0x00007fb36835022f in clone () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6

Thread 1 (Thread 0x7fb36695b340 (LWP 2133)):
#0  0x00007fb368346501 in poll () from /nix/store/681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27/lib/libc.so.6
#1  0x00007fb3685b92de in g_main_context_iterate.isra () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#2  0x00007fb3685b98e2 in g_main_loop_run () from /nix/store/vfbzpdwyiy0ygmc92af0zgj3m0s6shaw-glib-2.58.2/lib/libglib-2.0.so.0
#3  0x000000000040bf32 in main ()
(gdb) call gjs_dumpstack ()
No symbol table is loaded.  Use the "file" command.

@worldofpeace This output is probably not helpful :/

davidak commented 5 years ago

This happens also when switching users.

davidak commented 5 years ago

This is still reproducible.

Locked at Oct 22 16:50, trying to unlock some seconds after, then session crash, then login sucessfully.

See syslog.txt

davidak commented 4 years ago

I'm using my Thinkpad X230 with 1 external display connected via VGA right now and don't have this problem.

So the cause might be the other hardware (it has hybrid graphics) or 2 displays (using docking station.)

I will try to test with a desktop computer and more than 1 display.

worldofpeace commented 4 years ago

@davidak Pantheon has made some moves to use latest mutter, ref https://github.com/NixOS/nixpkgs/pull/73906.

worldofpeace commented 4 years ago

I tried this out with https://github.com/NixOS/nixpkgs/pull/73906, and I can't reproduce this issue anymore. But switchboard-plug-display needs to be updated for latest mutter.

davidak commented 4 years ago

Great. I'm currently in homeoffice and don't have a docking station or multiple monitors here, so i will not be able to test it for some time...

But i put it on the list of tasks to do at the office.

worldofpeace commented 4 years ago

Great. I'm currently in homeoffice and don't have a docking station or multiple monitors here, so i will not be able to test it for some time...

But i put it on the list of tasks to do at the office.

Thanks. It will be merged to nixos-unstable once staging-next gets merged https://github.com/NixOS/nixpkgs/pull/83618.

stale[bot] commented 4 years ago

Hello, I'm a bot and I thank you in the name of the community for opening this issue.

To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.

If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.

Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.

worldofpeace commented 4 years ago

I don't see this anymore. Closing