elementary / notifications

Gtk Notifications Server
https://elementary.io
GNU General Public License v3.0
38 stars 6 forks source link

Notifications originating on another workspace cause crashes #139

Closed wout closed 1 year ago

wout commented 3 years ago

Prerequisites

Describe the bug

I'm using Odin Beta 2 on my laptop for a week now and noticed at least five crashes on workspaces where the windows are in full or split-screen mode. This happens mostly when a notification occurs while moving between workspaces.

The crash sometimes results in all windows from all workspaces moving to the current workspace. But other times it results in all full and split-screen windows being over-scaled (from the looks of it, by the height of the main top panel).

At the moment of the crash, everything freezes for a second or two and most importantly, the notification doesn't appear on the top right but on the top left of the screen.

To switch workspaces, I use all available methods:

To Reproduce

Steps to reproduce the behavior:

  1. While receiving a notification...
  2. Switch to a full or split-screen workspace

Note: it's hard to reproduce.

Screenshots or screen recordings

I'll try to take a screenshot the next time it happens.

Logs

I'll report those when it happens again.

Platform Information

image image

Additional context

Using a single 4k monitor in HiDPI mode (3840 x 2160 @ 60 Hz).

janxkoci commented 2 years ago

Updates and workaround

I was watching out for this bug lately and so far it looks like I can only trigger the bug with Telegram, at least with the apps I have installed (and I don't have many apps yet as I'm trying to keep my fresh new OS - well fresh :smile: ). I will try to install Slack and see how it will do (I use it in a browser for now). Or maybe Spotify. Edit: Slack is fine, as Jeremy mentions above. Spotify does not switch workspace after clicking the notification, so it cannot trigger the crash either.

In the meantime I found a way to use Telegram, enjoy it's notifications, but prevent the crashes. As the cause of crashes relates to switching workspaces after clicking a Telegram notification, the workaround solution is quite obvious - set Telegram to show up on all workspaces. This way, there is no workspace switching after clicking the notification, so there is also no crash. The downside is that you will have Telegram on every workspace, potentially getting in the way of your other work.

The above workaround may be useful to users who want(ed) to move away from elementary OS due to this bug, such as @wout or @sekunho. Hope it helps.

Edit

I was just looking above at apps people use as sources of notifications, and I see @jeremypw is using Slack, so we can rule that one out. So far these apps were mentioned as (potential) culprits:

Confirmed:

Maybe:

wout commented 2 years ago

@janxkoci Thanks for your help in this. I am very busy at the moment so I have no time to test thoroughly. But I can confirm that the crashes happen at the same frequency and are just as severe on both my laptop and my desktop, while they are two completely different configurations. I'll come back here if I have some gdb output and more findings.

janxkoci commented 2 years ago

I may try to get something out of gdb or coredumpctl later too.

janxkoci commented 2 years ago

I'm thinking about what @jeremypw said:

I noted that on clicking on the notification that Terminal "bounced" in the dock but the workspace did not switch to focus the Terminal window so your step 3 was not reproduced exactly. Not sure if this is due to some setting being different or because it was a Telegram notification.

I also noticed that I fail to trigger the bug if there is no workspace switch upon clicking the notification.

My question is: Is this something the app devs can control? I mean does the notification feature allow devs to choose if their app should be focused or if just the icon should bounce in the dock? If you don't know yourself, @jeremypw, could you please ping somebody who might know?

I'm thinking if it's possible that some devs inadvertedly misuse the notifications in a way that leads to a crash and if perhaps this could be avoided in the Notifications code. But it kind of also depends on if this is by design or not.

janxkoci commented 2 years ago

One more update: I was just able to trigger the bug with Slack that I installed with flatpak. Same setup as described above - I texted my boss and waited for a reply, then switched to a different workspace. When reply arrived, I clicked the notification, which took me to the Slack app (i.e. switching the workspace in the process). After that I changed back to different workspace and changed brightness to bring up a notification and bam - I got a crash.

janxkoci commented 2 years ago

Another follow-up

Today I was able to reproduce the crash with a Skype notification. I've updated the list of confirmed trigger apps above. I also previously updated the symptoms section, which I forgot to mention before.

Also, I got one crash yesterday where I got kicked out of the session and had to login again. It was after 6+ days of uptime, so it's possible I had some crashes during that session in previous days and this was the escalation mentioned by others in this thread, but I'm not sure. I don't really get that many of the crashes, probably because I don't keep the chatting apps "always-on", but only when I need them (they are "always-on" on my phone anyway).

wout commented 2 years ago

So, today I've been running wingpanel with gdb and I got a crash:

[Thread 0x7fff68ff9700 (LWP 306215) exited]
[New Thread 0x7fff68ff9700 (LWP 360919)]

(io.elementary.wingpanel:105416): GLib-CRITICAL **: 15:18:29.409: g_key_file_get_boolean: assertion 'key_file != NULL' failed
[Detaching after fork from child process 360939]
[New Thread 0x7fff4b7fe700 (LWP 360941)]
[Thread 0x7fff68ff9700 (LWP 360919) exited]
[Thread 0x7fff4b7fe700 (LWP 360941) exited]

Wingpanel did start again and this is the backtrace:

$ (gdb) bt
#0  0x00007ffff71289cf in __GI___poll (fds=0x555555cc6790, nfds=11, timeout=465) at ../sysdeps/unix/sysv/linux/poll.c:29
elementary/wingpanel-indicator-notifications#1  0x00007ffff7ebc36e in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
elementary/wingpanel-indicator-notifications#2  0x00007ffff7ebc4a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
elementary/wingpanel-indicator-notifications#3  0x00007ffff7d6bfe5 in g_application_run () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
elementary/wingpanel-indicator-notifications#4  0x000055555555dfd1 in main ()
jeremypw commented 2 years ago

The backtrace of the main thread does not look very interesting. Might be worth running thread apply all backtrace (or t a a bt for short) to get a back trace of all the threads. Hopefully one of them will contain the g_key_file_get_boolean error with more info. You will need to have installed the -dev files for wingpanel and its dependencies to get human readable symbols and source code line numbers.

wout commented 2 years ago

That's what I thought. There's hardly any information in there.

Could you elaborate a bit more about the -dev files? Should I build wingpanel from source, or...? I have hardly any experience with Linux development, so I'm just following instructions here. :)

jeremypw commented 2 years ago

@wout That's OK - thanks for trying! I am usually building from source so I have to install the development files anyway, which I do with sudo apt build-dep <name of package>. So for this repo the command is sudo apt build-dep wingpanel-indicator-notifications. The package name is not always the same as the repo name or executable name unfortunately ... for wingpanel itself you need sudo apt build-dep io.elementary.wingpanel.

jeremypw commented 2 years ago

I do not think you have to build from source - gdb will use the installed development files anyway. THings are a bit fluid with wingpanel and gala at the moment so best stick to the stable version ;-)

wout commented 2 years ago

Thanks. That's done! It's now running in gdb:

$ (gdb) run
Starting program: /usr/bin/io.elementary.wingpanel 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5e38700 (LWP 770173)]
[New Thread 0x7ffff5637700 (LWP 770174)]
[New Thread 0x7ffff4e36700 (LWP 770175)]
[New Thread 0x7fffed58e700 (LWP 770176)]
[New Thread 0x7fffeca9e700 (LWP 770177)]
[New Thread 0x7fffc1185700 (LWP 770180)]

(io.elementary.wingpanel:770169): GLib-GObject-WARNING **: 20:15:52.309: invalid (NULL) pointer instance

(io.elementary.wingpanel:770169): GLib-GObject-CRITICAL **: 20:15:52.309: g_signal_connect_object: assertion 'G_TYPE_CHECK_INSTANCE (instance)' failed

(io.elementary.wingpanel:770169): libnm-CRITICAL **: 20:15:52.309: ((libnm-core/nm-connection.c:193)): assertion '<dropped>' failed
[Thread 0x7ffff5637700 (LWP 770174) exited]

I'm in the same login session for more than a day now, and it has crashed already a few times. So now we'll have to wait until it crashes again. I'll report back when that happens.

wout commented 2 years ago

Just had another crash, but it doesn't look like there's a whole lot of info in there.

[Thread 0x7fff727fc700 (LWP 784909) exited]
[Detaching after fork from child process 784969]
[New Thread 0x7fff727fc700 (LWP 784971)]
[Thread 0x7fff4d0fa700 (LWP 784947) exited]
Gtk-Message: 20:27:13.475: Failed to load module "canberra-gtk-module"
Gtk-Message: 20:27:13.475: Failed to load module "canberra-gtk-module"
Gtk-Message: 20:27:15.022: Failed to load module "canberra-gtk-module"
Gtk-Message: 20:27:15.022: Failed to load module "canberra-gtk-module"
[Thread 0x7fff727fc700 (LWP 784971) exited]

(io.elementary.wingpanel:770169): io.elementary.wingpanel.power-CRITICAL **: 08:01:26.466: Device.vala:190: Updating the upower device parameters failed: GDBus.Error:org.freedesktop.DBus.Error.UnknownMethod: No such interface “org.freedesktop.UPower.Device” on object at path /org/freedesktop/UPower/devices/mouse_hid_CC2107201USJ2Y1A4_battery
[New Thread 0x7fff727fc700 (LWP 1622825)]
[Thread 0x7fff727fc700 (LWP 1622825) exited]
^C--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "io.elementary.w" received signal SIGINT, Interrupt.
0x00007ffff71289cf in __GI___poll (fds=0x555555f51b90, nfds=12, timeout=272) at ../sysdeps/unix/sysv/linux/poll.c:29
29  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  0x00007ffff71289cf in __GI___poll (fds=0x555555f51b90, nfds=12, timeout=272) at ../sysdeps/unix/sysv/linux/poll.c:29
elementary/wingpanel-indicator-notifications#1  0x00007ffff7ebc36e in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
elementary/wingpanel-indicator-notifications#2  0x00007ffff7ebc4a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
elementary/wingpanel-indicator-notifications#3  0x00007ffff7d6bfe5 in g_application_run () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
elementary/wingpanel-indicator-notifications#4  0x000055555555dfd1 in main ()
(gdb) t a a bt

Thread 38 (Thread 0x7fff705f7700 (LWP 784922)):
#0  futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7fff480222dc) at ../sysdeps/nptl/futex-internal.h:183
elementary/wingpanel-indicator-notifications#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fff480211d0, cond=0x7fff480222b0) at pthread_cond_wait.c:508
elementary/wingpanel-indicator-notifications#2  __pthread_cond_wait (cond=0x7fff480222b0, mutex=0x7fff480211d0) at pthread_cond_wait.c:638
elementary/wingpanel-indicator-notifications#3  0x00007fffbb75688a in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#4  0x00007fffbb74e726 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#5  0x00007fffbb74cc9e in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#6  0x00007fffbb7558d1 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#7  0x00007fffbb74f2e0 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#8  0x00007fffbba3e187 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#9  0x00007fffbb755247 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#10 0x00007fffbba0523f in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#11 0x00007fffbba3353d in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#12 0x00007fffbba13aa6 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#13 0x00007ffff6f82609 in start_thread (arg=<optimised out>) at pthread_create.c:477
elementary/wingpanel-indicator-notifications#14 0x00007ffff7135163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 37 (Thread 0x7fff707f8700 (LWP 784921)):
#0  futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7fff480222dc) at ../sysdeps/nptl/futex-internal.h:183
elementary/wingpanel-indicator-notifications#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fff480211d0, cond=0x7fff480222b0) at pthread_cond_wait.c:508
elementary/wingpanel-indicator-notifications#2  __pthread_cond_wait (cond=0x7fff480222b0, mutex=0x7fff480211d0) at pthread_cond_wait.c:638
elementary/wingpanel-indicator-notifications#3  0x00007fffbb75688a in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#4  0x00007fffbb74e726 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#5  0x00007fffbb74cc9e in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#6  0x00007fffbb7558d1 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#7  0x00007fffbb74f2e0 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#8  0x00007fffbba3e187 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#9  0x00007fffbb755247 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#10 0x00007fffbba0523f in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#11 0x00007fffbba3353d in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#12 0x00007fffbba13aa6 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#13 0x00007ffff6f82609 in start_thread (arg=<optimised out>) at pthread_create.c:477
elementary/wingpanel-indicator-notifications#14 0x00007ffff7135163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 36 (Thread 0x7fff983f6700 (LWP 784920)):
#0  futex_wait_cancelable (private=<optimised out>, expected=0, futex_word=0x7fff480222dc) at ../sysdeps/nptl/futex-internal.h:183
elementary/wingpanel-indicator-notifications#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fff480211d0, cond=0x7fff480222b0) at pthread_cond_wait.c:508
elementary/wingpanel-indicator-notifications#2  __pthread_cond_wait (cond=0x7fff480222b0, mutex=0x7fff480211d0) at pthread_cond_wait.c:638
elementary/wingpanel-indicator-notifications#3  0x00007fffbb75688a in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#4  0x00007fffbb74e726 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#5  0x00007fffbb74cc9e in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#6  0x00007fffbb7558d1 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#7  0x00007fffbb74f2e0 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#8  0x00007fffbba3e187 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#9  0x00007fffbb755247 in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
elementary/wingpanel-indicator-notifications#10 0x00007fffbba0523f in  () at /lib/x86_64-linux-gnu/librsvg-2.so.2
jeremypw commented 2 years ago

^C--Type for more, q to quit, c to continue without paging-- Thread 1 "io.elementary.w" received signal SIGINT, Interrupt.

Hmm, Did you press Ctrl + C?

It is possible that when the display gets "messed up" wingpanel hasn't actually crashed (you would usually see SEGFAULT in the gdb output) - it is just not being displayed properly :shrug:

I'll try installing the Slack flatpak to see whether I can reproduce the crash.

jeremypw commented 2 years ago

@janxkoci Thanks for the additional info. The original report did not seem to mention clicking on the notification although later comments do so I am not sure whether these are all the same bug. Anyway, apps can send one or more action names with the notification (or else the fallback action which is to launch the app occurs). Afaict from the code, whether the app is focused (with workspace switch) or just bounces depends on the window manager not the wingpanel indicator plugin or the app.

janxkoci commented 2 years ago

@jeremypw Thank you for taking time to look into this.

The original report did not seem to mention clicking on the notification although later comments do so I am not sure whether these are all the same bug.

This is true, however I think the importance of clicking could easily be missed, as the crash occurs during the next notification (even if it's just volume or brightness change) and not the first notification that actually triggers the bug, as I described in my steps to reproduce.

I tried to reproduce the bug without clicking a notification, just switching a workspace manually while notification from Telegram is displayed, but I fail to trigger the crash this way. I also fail to trigger it by clicking notification that does not cause a workspace switch.

Note also that the original reports mention maximized or half-maximized windows, while I can easily trigger the crash without any such windows, anytime I want (as I described above). I think since the crash occurs separated from its trigger, it can be hard to figure out what is important.

Anyway, apps can send one or more action names with the notification (or else the fallback action which is to launch the app occurs). Afaict from the code, whether the app is focused (with workspace switch) or just bounces depends on the window manager not the wingpanel indicator plugin or the app.

Hmm, so why some apps do one thing and other apps another thing? How does the WM decide which behaviour should occur?

BTW, I remember sometimes having crashes even when I did not use Telegram or any of the other apps mentioned above all day long. I suspect there is some app I use regularly that I missed. I always thought it could be one of my email clients, but I could not reproduce it at the time. But I now realize why that could be.

I use Geary now (from flathub), but clicking its notifications does nothing (does not even open its window if it's not open already). Before, I also used Mail, but I had to abandon it because of a few issues. So at the moment I don't have any account set up in Mail and I turned off its autostart. But I remember it does open its window after clicking notification of a new email. So I think if it should be possible to trigger the bug with Mail, specific conditions should be met first:

I will try to setup one account in Mail and try if I can trigger it like this.

wout commented 2 years ago

Thanks for your input @janxkoci.

I tried to reproduce the bug without clicking a notification, just switching a workspace manually while notification from Telegram is displayed, but I fail to trigger the crash this way.

I just would like to add that I never click on notifications. So even if @janxkoci was able to produce the crash that way, clicking on it is definitely not the main trigger. I do notice that the crashes occur more often if I interact with my mouse or touchpad (e.g. scrolling through a web page). But I also noticed them without any interaction at all.

Just noticed this issue is becoming very long. :joy: I wonder if it would be fixed by just switching to elementary OS 7.

jeremypw commented 2 years ago

Hmm, so why some apps do one thing and other apps another thing? How does the WM decide which behaviour should occur?

It maybe that some apps implement a specific action whereas for others the fallback action occurs.

jeremypw commented 2 years ago

Just tested the Slack flatpak on OS6.1 and can confirm that when a notification arrives when the Slack app is on another workspace, the workspace briefly switches to show the Slack app and the notification appears on the top/left of the screen. Almost immediately the notification disappears and the workspace switches back. Clicking on the notification entry in the indicator did not cause the Slack app to show.

This sounds like this previously reported bug: https://github.com/elementary/notifications/issues/194

This is not what I would call a "crash" however, although it is certainly undesirable. I'll check whether it also occurs in OS7.

janxkoci commented 2 years ago

It maybe that some apps implement a specific action whereas for others the fallback action occurs.

I see. I guess it makes sense for chat apps to implement something like this. I noticed that stock elementary apps generally bump the icon in the dock and give it red glow - I assume that is the fallback action.

This is not what I would call a "crash" however, although it is certainly undesirable.

I don't know, it shares several symptoms with crashes I've seen in other WMs in the past (like the black rectangles in place of shadows). I could make a video with my phone maybe.

janxkoci commented 2 years ago

Just an update: today I got several crashes when using Slack in Epiphany web browser. I've updated the list above to include Epiphany.

wout commented 2 years ago

@jeremypw

Just tested the Slack flatpak on OS6.1 and can confirm that when a notification arrives when the Slack app is on another workspace, the workspace briefly switches to show the Slack app and the notification appears on the top/left of the screen. Almost immediately the notification disappears and the workspace switches back. Clicking on the notification entry in the indicator did not cause the Slack app to show.

It looks more like the workspace where the notification originates is shown on top of the current workspace. You get a bunch of windows stacked together. My laptop is getting pretty old, and its graphics have always been sub-par (Intel UHD 620). The recovery after a notification takes significantly longer than on my desktop, which has an RX 580. Sometimes longer than a second, so I can see the clash (or crash) of windows clearly.

janxkoci commented 2 years ago

@wout I suspect that all workspaces are projected onto your current one. Or rather, as Gala crashes, the concept of workspaces temporarily stops to apply and so all your windows just show up in your current view - there are no workspaces at all for a second or two. Then I guess Gala restarts and resumes it's previous state (at least window placement on each workspace, while some other info gets lost, like size of maximized windows and positions of some other windows).

wout commented 2 years ago

@janxkoci Yes, you are right. :) It just happened again, and I saw all the windows stacked on top of each other.

janxkoci commented 2 years ago

Moreover, I'm thinking why it's usually text editors like Sublime text or VSCodium that get misplaced a little after the crash and I realize now why that might be - their window decorations (i.e. titlebar) are controlled by Gala, as they use the oldschool server-side decorations, while Files, Epiphany and such use client-side decorations (i.e. headerbar), which are controlled by the apps themselves (and thus out of control of Gala).

This means that during the crash, the server-side window decorations probably also disappear for the duration of the crash. I will watch out for that.

wout commented 2 years ago

@janxkoci I can confirm that's the case. I believe Postman is also using such a title bar, and it wasn't present when it crashed just a moment ago.

jeremypw commented 2 years ago

Just to note that the problem I observed with the Slack app on OS6.1 does not occur on OS7. Clicking on a notification takes you to the workspace where the Slack app is without issue.

wout commented 2 years ago

That's great news. Can't wait to make the switch.

jeremypw commented 2 years ago

There are currently some new visual glitches in OS7 but they will no doubt be fixed before release.

wout commented 2 years ago

Just noticed this "Notifications Demo" task is running again after quite 12 days of uptime, and it's getting bigger and bigger:

image

It's not something I opened on purpose.

jeremypw commented 2 years ago

On OS7, I could only get the elementary notifications demo to run by deliberately typing io.elementary.notifications.demo into a terminal (it does not appear in the applications menu). Notifications sent by this app (in response to user action) are clearly labelled in the wingpanel drop-down. The process stops when the demo window closed. I could not find any other core elementary code that launches this app. Is it in the "Startup" list (System Settings/Applications/Startup)?

Can you confirm the executable name of the demo process on you machine? Does it have a corresponding dock entry or window?

wout commented 2 years ago

It's not in the startup applications:

image

Here's some more info:

image

It's the process with id 2567:

image

jeremypw commented 2 years ago

Process 2567 is the notifications server (which is normally running) not the demo. You could try running io.elementary.notifications.demo in a terminal to see it create a different process. Maybe a bug in the System Monitor app you are using?

wout commented 2 years ago

You're right, a new process starts when I open the demo app. Probably a bug in Monitor.

But still, isn't memory consumption excessive? It keeps on growing, which points to a memory leak I think. Or am I wrong?

image

jeremypw commented 2 years ago

It does seem excessive. It starts of using 5.5Mb on my machine. But after sending 2000 test notifications is was still only 65Mb. You can try clearing the notifications but that will not necessarily reduce the memory usage immediately but might stop increasing further.

wout commented 2 years ago

Just done that, it's now at 382.5 Mb. I'll keep an eye on it.

jeremypw commented 2 years ago

Noticed that after sending the 2000 notifications the wingpanel was using 265Mb. I just tried clearing the 2000 notifications, which did not decrease memory use. I then sent some more - there was virtually no increase in memory so it looks like the memory is not being reclaimed when notifications are dismissed but it is being reused for more notifications. Anyway, this is off topic for this issue; should really open another.

wout commented 2 years ago

Yeah, sorry about that. I thought it may have been related.

GranPC commented 1 year ago

Can reproduce on daily builds. watch -n0.5 notify-send hello causes the crash rather quickly.

What's most irritating (and trust me, there are a lot of irritating things about this) is that if it happens in quick succession (to be exact, twice in a minute), gnome-session then gives up on restarting gala, throws up the fail whale screen, and even if you restart gala, the fail whale still covers everything. If you kill the process, gnome-session terminates itself. Everything I just described is hardcoded, so there's no way to turn off this behavior, which makes debugging this whole thing about as exciting as a visit to the dentist. I'm not sure who thought that porting the BSoD to Linux was a good idea, and I am going to do my best to not find out.

Anyway, in case anyone wants to try and debug this too, my workaround for now has been to use xdotool to hide the fail whale and then restart gala. I'm going to try and figure out what's going on here ASAP; if anyone who has worked on Gala has any pointers re: where to start looking (beyond getting a full backtrace) please let me know.

wout commented 1 year ago

Great to see movement on this issue. I've resorted to turning off notifications altogether. In the meantime, I bought a new computer, did a fresh installation, and it had this issue from day one.

janxkoci commented 1 year ago

@GranPC Sorry to hear that, but elementary OS does not use gnome-session - are you using gala on some other OS?

Since I'm here, you reminded me I wanted to post an update.

I've managed to find a native app that caused me a lot of crashes - Fondo (no account needed :partying_face: ).

The app lets you select a wallpaper from Unsplash.com and set it as your background, sending a notification after every success. It's natural to try a bunch of wallpapers and switch to empty desktop to check them out. With the incoming notifications and frequent workspace switching, the crash is just behind the corner, trust me.

I hope this can help with reproducing the issue and debugging.

GranPC commented 1 year ago

Are you sure? I just installed a fresh VM with 6.0 and...

root@StandardPCQ35ICH920098b39e57a:/home/jesus# ps aux | grep session
root         841  0.0  0.2 160328  8168 ?        Sl   18:10   0:00 lightdm --session-child 12 21
jesus        864  0.0  0.3 596044 15608 ?        Ssl  18:11   0:00 /usr/libexec/gnome-session-binary --systemd --builtin --session=pantheon
root@StandardPCQ35ICH920098b39e57a:/home/jesus# strings /usr/libexec/gnome-session-binary | grep whale
We failed, but the fail whale is dead. Sorry....
whale
Show the fail whale dialog for testing

But, to answer your question... I actually installed Ubuntu 22.10, stripped a bunch of the stuff it came with, and installed the elementary OS session. Which actually depends on Ubuntu 22.04, so getting the dependencies right was... not fun. I assumed that's why it was crashing, but I decided to check the issue tracker and... well, here we are.

janxkoci commented 1 year ago

Okay, that can be the same issue or not, hard to say with that many mods :sweat_smile:

Can you try to reproduce with the Fondo app, as I described?

GranPC commented 1 year ago

It's definitely the same issue: I also noticed it was related to workspaces, the symptoms are identical, the logs look the same, and the watch method reproduces it on my clean elementary OS 6 VM. Just grabbing debug symbols now and rebuilding Gala to debug it.

GranPC commented 1 year ago

It seems like the issue is that, if a notification gets closed while you are in the process of switching workspaces, WindowManager.destroy is not called, so the notification stack never knows the window got destroyed, and tries to continue to do stuff with it next time a notification is received, leading to the crash. Once I figure out why WindowManager.destroy is not being called (shouldn't be much longer) I will publish a patch.

GranPC commented 1 year ago

Ha, found the root cause: https://github.com/GNOME/mutter/blob/3.36.9/src/compositor/meta-window-actor.c#L882

I'm not sure how to fix this "properly" and I'm not keen on forking Mutter too. We might be doing something wrong, but I'm just going to make the notification stack drop windows that are no more. This will fix the crash.

You may still see notifications stuck on the top left. I believe this is what happens when a notification is received while you are switching workspaces; Mutter does not notify us of this, either, so we can't apply our cool positioning logic. You could feasibly fix this by iterating through all the windows instead of maintaining a local list, but I'm running this on a laptop and I want power efficiency.

https://github.com/GranPC/gala/tree/jesus/notification-crash-hack

There is also a version available for Horus (in case someone's running 7 and also frustrated): https://github.com/GranPC/gala/tree/jesus/notification-crash-hack-horus

Please enjoy. I hope my research will help someone fix this properly.

GranPC commented 1 year ago

"Fixed" in elementary/gala#1497 - @lenemter, close?

lenemter commented 1 year ago

Fixed in https://github.com/elementary/gala/pull/1497. Thanks @GranPC.

janxkoci commented 1 year ago

That's great news! Is the fix gonna get into OS 6.1 or only to OS 7?

janxkoci commented 1 year ago

I just got a crash with an email notification, so I guess the fix did not land in OS 6 yet :confused: