NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.29k stars 13.54k forks source link

gnome-shell crashes on login #234265

Open blitz opened 1 year ago

blitz commented 1 year ago

Describe the bug

On my system with Gnome, I enter my password in GDM and then I see a blank screen and end up at GDM again. In the background gnome-session has crashed.

I would say there is a 80% chance this happens directly after boot. The second login attempt is usually fine.

My system is a Lenovo L14 (AMD) with integrated Radeon graphics running NixOS 23.05, but this has happened a long time already. It's not a Gnome 44.1 regression.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Select user and type password in GDM.
  2. See GDM re-appear

Expected behavior

Logging in to the session works reliably.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

There is a complete system log with

  services.xserver.displayManager.gdm.debug = true;
  services.xserver.desktopManager.gnome.debug = true;

here: gnome-session-issue.log

Look for "segfault" to find the interesting part. The coredump produced this backtrace:

Core was generated by `/nix/store/z3f6qcbkk6px6x6hnr8mxkkpv47xr06n-gnome-shell-44.1/bin/gnome-shell'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fddfa60d7dc in JS::HeapObjectPostWriteBarrier(JSObject**, JSObject*, JSObject*) ()
   from /nix/store/19v08nmkidbzg1adzpycnd30xbkyhf8c-spidermonkey-102.8.0/lib/libmozjs-102.so
[Current thread is 1 (Thread 0x7fddf8351e00 (LWP 1694))]
(gdb) bt full
#0  0x00007fddfa60d7dc in JS::HeapObjectPostWriteBarrier(JSObject**, JSObject*, JSObject*) ()
   from /nix/store/19v08nmkidbzg1adzpycnd30xbkyhf8c-spidermonkey-102.8.0/lib/libmozjs-102.so
No symbol table info available.
#1  0x00007fddfbf51541 in GjsContextPrivate::~GjsContextPrivate() [clone .localalias] ()
   from /nix/store/y7p15pygbi1npvj7rb9hq2f3kd2id9cg-gjs-1.76.0/lib/libgjs.so.0
No symbol table info available.
#2  0x00007fddfbf4ffbf in gjs_context_finalize(_GObject*) () from /nix/store/y7p15pygbi1npvj7rb9hq2f3kd2id9cg-gjs-1.76.0/lib/libgjs.so.0
No symbol table info available.
#3  0x00007fddfc1d0530 in g_object_unref () from /nix/store/n0bf4ddl69nk0lm6awh834syxqh0d3ss-glib-2.76.2/lib/libgobject-2.0.so.0
No symbol table info available.
#4  0x00000000004039fd in main ()

The backtrace is pretty similar in this bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973740

Notify maintainers

@NixOS/gnome

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.3.3, NixOS, 23.05 (Stoat), 23.05.20230525.3e01645`
 - multi-user?: `no`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.3`
 - channels(root): `""`
 - channels(julian): `""`
 - nixpkgs: `not found`
jtojnar commented 1 year ago

Could you please try to install debug symbols for GNOME Shell?

Could you please get the output of bt full in coredumpctl gdb $pid with the following installed (where pid is from the output of coredumpctl)?

environment.enableDebugInfo = true;
environment.systemPackages = [
  # Explicitly install to get debug symbols.
  pkgs.gnome.gnome-shell
  pkgs.gnome.mutter
  pkgs.gnome.gnome-session
  pkgs.glib
  pkgs.gjs
  pkgs.spidermonkey_102
];

Do not forget to re-login after switching to that configuration or run coredumpctl with NIX_DEBUG_INFO_DIRS=/run/current-system/sw/lib/debug environment variable, since nixos-rebuild switch cannot update environment variables for running programs.

blitz commented 1 year ago

This yielded the following:

(gdb) bt full
#0  0x00007f227740d7dc in JS::HeapObjectPostWriteBarrier(JSObject**, JSObject*, JSObject*) ()
   from /nix/store/19v08nmkidbzg1adzpycnd30xbkyhf8c-spidermonkey-102.8.0/lib/libmozjs-102.so
No symbol table info available.
#1  0x00007f2278d51541 in js::BarrierMethods<JSObject*, void>::postWriteBarrier (next=0x0, 
    prev=<optimized out>, vp=0x22b8fb0)
    at /nix/store/kq4468pg4jrwg5vzsvr4kg7bhmknvwfp-spidermonkey-102.8.0-dev/include/mozjs-102/js/RootingAPI.h:795
No locals.
#2  JS::Heap<JSObject*>::postWriteBarrier (next=<optimized out>, 
    prev=@0x22b8fb0: 0x2685adb724c0, this=0x22b8fb0)
    at /nix/store/kq4468pg4jrwg5vzsvr4kg7bhmknvwfp-spidermonkey-102.8.0-dev/include/mozjs-102/js/RootingAPI.h:376
No locals.
#3  JS::Heap<JSObject*>::~Heap (this=0x22b8fb0, __in_chrg=<optimized out>)
    at /nix/store/kq4468pg4jrwg5vzsvr4kg7bhmknvwfp-spidermonkey-102.8.0-dev/include/mozjs-102/js/RootingAPI.h:338
No locals.
#4  mozilla::detail::VectorImpl<JS::Heap<JSObject*>, 0ul, js::SystemAllocPolicy, false>::destroy (aEnd=0x22b9010, aBegin=<optimized out>)
    at /nix/store/kq4468pg4jrwg5vzsvr4kg7bhmknvwfp-spidermonkey-102.8.0-dev/include/mozjs-102/mozilla/Vector.h:65
        p = 0x22b8fb0
#5  mozilla::Vector<JS::Heap<JSObject*>, 0ul, js::SystemAllocPolicy>::~Vector (
    this=0xd932a8, __in_chrg=<optimized out>)
    at /nix/store/kq4468pg4jrwg5vzsvr4kg7bhmknvwfp-spidermonkey-102.8.0-dev/include/mozjs-102/mozilla/Vector.h:901
        g = <optimized out>
#6  JS::GCVector<JS::Heap<JSObject*>, 0ul, js::SystemAllocPolicy>::~GCVector (this=0xd932a8, 
    __in_chrg=<optimized out>)
    at /nix/store/kq4468pg4jrwg5vzsvr4kg7bhmknvwfp-spidermonkey-102.8.0-dev/include/mozjs-102/js/GCVector.h:43
No locals.
#7  GjsContextPrivate::~GjsContextPrivate (this=0xd93230, __in_chrg=<optimized out>)
    at ../gjs/context.cpp:487
        _pp = <optimized out>
        _ptr = <optimized out>
        _pp = <optimized out>
        _ptr = <optimized out>
        _pp = <optimized out>
        _ptr = <optimized out>
#8  0x00007f2278d4ffbf in gjs_context_finalize (object=0xd933b0) at ../gjs/context.cpp:500
        gjs = <optimized out>
#9  0x00007f2278fd0530 in g_object_unref (_object=0xd933b0) at ../gobject/gobject.c:3938
        weak_locations = <optimized out>
        nqueue = 0x2261900
        object = 0xd933b0
--Type <RET> for more, q to quit, c to continue without paging--c
        old_ref = <optimized out>
        __func__ = "g_object_unref"
        retry_atomic_decrement1 = <optimized out>
#10 0x00007f2279228ba9 in _shell_global_destroy_gjs_context (self=<optimized out>)
    at ../src/shell-global.c:750
        _pp = <optimized out>
        _ptr = <optimized out>
#11 0x00000000004039fd in main (argc=<optimized out>, argv=<optimized out>)
    at ../src/main.c:674
        context = 0x8d9ef0
        error = 0x0
        ecode = 0
blitz commented 1 year ago

Could it help to run this in valgrind? If so, is there a straightforward way to do that?

jtojnar commented 1 year ago

No straightforward way, will require something like https://github.com/NixOS/nixpkgs/issues/226355#issuecomment-1510488966 (including the valgrind patch for debug symbols).

Weird thing that the backtrace is different from the previous one. And that the crash is only triggered by cleanup code, which means the shell termination must have been caused by something that happened earlier.

blitz commented 1 year ago

the shell termination must have been caused by something that happened earlier.

Yes.

Regarding valgrind, I'll try that later today.

blitz commented 1 year ago

With valgrind enabled it doesn't crash. I've tried many times and the login just worked. When I rolled back the changes it crashed again on the first try. Might be a race condition of sorts.

blitz commented 1 year ago

So waiting at the login prompt for a while also seems to workaround the problem. So my laptop is too fast? :smile: Any debug ideas are welcome. Otherwise, I'm going to dig into the libmozjs code that actually crashes, but I give this a low chance of success.

bjornfor commented 1 year ago

So waiting at the login prompt for a while also seems to workaround the problem.

I have the same issue, for over a year. I think I can make Gnome crash indefinitely by typing my password quickly all the time -- it seems to require a few seconds between showing the password input field and me pressing enter.

I wonder if this is related to https://github.com/NixOS/nixpkgs/issues/103746. That issue can also be worked around with a 5s sleep.

Autoradiowecker commented 10 months ago

same issue here

sjhaleprogrammer commented 9 months ago

i also have this problem first login after unplugging my charger

gnome 44.5

unlux commented 1 month ago

same here, multiple failed logins, sometimes blank grey screen with cursor working after login ( i presume that the main thing isnt working because the mouse doesnt have the flat profile applied ) -> when that happens,

  1. i press ctrl+alt+f2 to go to tty2(you can go to any one),
  2. there, systemctl status gdm outputs that the process
    1. then, i run pkill gdm ( sometimes works without sudo and sometimes doesn't)
      • gdm starts up properly, working as expected, logs in in the first try
    2. if you press ctrl+alt+f1(for tty1), you can see the gnome desktop ( with your wallpaper and all customization )
      • you cannot see your mouse, you cannot give any input, by every means of input, it stays unresponsive
      • but, the time in the panel updates, if you have vitals extension, you can see it update too. but you cannot do anything
      • and once you get here, you cannot go to another tty by pressing any amounts of ctrl+alt+F-num
      • and the only means of recovery is hard reboot by long pressing the power button,

i hope this helps in the diagnosis, would love to help in any way.