NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.31k stars 13.55k forks source link

Pantheon: Random crashes since upgrading to 23.11 #274999

Open OPNA2608 opened 8 months ago

OPNA2608 commented 8 months ago

Describe the bug

After upgrading from 23.05 to 23.11, the rate of Pantheon crashes has gone from basically-never to once every few days. Some days I don't have any, other days it's 2-3 crashes.

I unfortunately can't pinpoint any particular situation that reproduces these crashes though. Here are some situations where I remember a crash happening afterwards:

...but I cannot force a crash by aggressively spamming these actions.

Steps To Reproduce

Unsure, beyond "Seemingly normal Pantheon usage on 23.11".

Expected behavior

No crashes.

Screenshots

n/a, just the regular Pantheon "An error occurred, please log out" screen after it happens.

Additional context

I'm not 100% certain this is really a Pantheon issue. Pantheon itself seems to look & behave fine for hours without any graphics issues. but the apps that I remember using before crashes are all GPU-accelerated.

I've been busy with bisecting my way through our git history for the last week or so, to find the cause of severe graphics issues & corruption under Miriway after upgrading from 23.05 to 23.11. Is it possible this is really a GPU driver issue? For now, all I can say about this issue is that gala is crashing sometimes.

If it turns out to be hardware/GPU-specific, I'm using a Radeon RX 5700 XT.

Any advice on how to debug the crashes would be appreciated, in case this is completely unreproducible on your end.

Notify maintainers

Pantheon maintainers for now: @davidak @bobby285271

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.66-xanmod1, NixOS, 23.11 (Tapir), 23.11.1779.cf28ee258fd5`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.1`
 - channels(bt1cn): `"unstable"`
 - channels(root): `"nixos-23.11"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a :+1: reaction to issues you find important.

OPNA2608 commented 8 months ago

Some situations I have now spotted, after further use. I'm not sure if they're actually new issues caused by the bump, or really 23.05 issues re-spotted after messing with my settings / trying out other packages again.

OPNA2608 commented 8 months ago

In addition to the above, the crashes from the OP came back yesterday. After getting 6 crashes in a row doing the same things post-login during a pair programming meetup, I had to give up on my graphical setup. I'll just list what I did after logging in:

Any suggestion for what I could do to provide more details would still be appreciated, because I know this isn't super helpful so far. Syslogs don't seem to have any details as-is, just a message that gala crashes along with the normal assertion failures.

bobby285271 commented 8 months ago

Honestly I don't think I can actually help fix such issue since I don't know much other than packaging, though Pantheon runs fine for me so far. Some random thing I can think of so far

  1. Did systemctl status bamfdaemon.service --user failed?
  2. Any Pantheon stuff in coredumpctl? Then follow https://discourse.nixos.org/t/how-to-investigate-gnome-crashing/19726/2 to get a backtrace
    • For projects written in Vala, you will need overriding the package with VALAFLAGS = "-g"; to get correct line numbers.
  3. If you see criticals, to trigger a core dump when criticals appear you can set G_DEBUG=fatal-criticals, then go back to step 2

And after getting a backtrace with debug symbols, and if you see lines pointing to mutter's source files, you can probably check which commit later touches those lines and try to backport it to the mutter Pantheon uses and see how it goes