dunst-project / dunst

Lightweight and customizable notification daemon
https://dunst-project.org
Other
4.64k stars 343 forks source link

Dunst crashes and dumps core when no X server is available #1095

Open pmattern opened 2 years ago

pmattern commented 2 years ago

If Dunst is launched in a context where it can't access an X server or Wayland, it crashes and dumps core right away.

This may e. g. happen in a virtual terminal session while X isn't running or when Dunst is launched by root on a system where X is run by a regular user.
Obviously, neither is an expected use case, but in particular the former can happen by incident rather quickly.
So all in all I think it would be good, if Dunst handled this situation a bit more gracefully.

Steps to reproduce:

1ef38e5 on Arch Linux x86_64 per AUR package dunst-git.

LeCodingWolfie commented 2 years ago

I did also have the same issue, though I am using the release version of dunst package, and also using Wayland as the main desktop environment through Hyprland as WM. Even more, running dunst-related (dunstify example as per the Wiki introduction) command results in:

Unable to send notification: Error calling StartServiceByName for org.freedesktop.Notifications: Timeout was reached

Trying to restart the daemon shows a similar error and dump.

Anyways, looking for issues with that headline, an issue of which referred "duplicate" by bebehei, can be solved by running:

systemctl --user import-environment DISPLAY

In fact, I just ran with the issue that a dunst process was running while trying to restart the dunst daemon:

CRITICAL: [dbus_cb_name_lost:1044] Cannot acquire 'org.freedesktop.Notifications': Name is acquired by 'dunst' with PID '96228'.

which I killed, and could restart the daemon back, and could run dunstify example.

Now, it's only showing this warning for me (which, by the way, has been previously reported on support on Wayland):

Xlib: extension "MIT-SCREEN-SAVER" missing on display ":0".

Using this similar solution may also help to solve the problem.

By the way, running dunst doesn't output anything, I don't know if it's an actual issue; though it might just start/enable the daemon if it isn't running (see)

antaz commented 1 year ago

importing $WAYLAND_DISPLAY=wayland-0 fixed the crash for me.

nPHYN1T3 commented 1 year ago

This can also be noted as "wrong XScreen." I find Dunst will core dump and hang something terrible if the calling script/application is on another XScreen which is super frustrating.

May 23 19:49:46 $HOSTNAME systemd[18642]: Starting Dunst notification daemon...
May 23 19:49:46 $HOSTNAME dunst[3074577]: WARNING: Cannot open X11 display.
May 23 19:49:46 $HOSTNAME dunst[3074577]: ERROR: [  get_x11_output:0065] Couldn't initialize X11 output. Aborting...
May 23 19:49:46 $HOSTNAME kernel: traps: dunst[3074577] trap int3 ip:7efd5feea9a5 sp:7fff048ff950 error:0 in libglib-2.0.so.0.7600.2[7efd5feaa000+9d000]
May 23 19:49:46 $HOSTNAME systemd[1]: Started Process Core Dump (PID 3074582/UID 0).
May 23 19:49:46 $HOSTNAME systemd-coredump[3074583]: [🡕] Process 3074577 (dunst) of user 1000 dumped core.

                                                  Stack trace of thread 3074577:
                                                  #0  0x00007efd5feea9a5 g_logv (libglib-2.0.so.0 + 0x5e9a5)
                                                  #1  0x00007efd5feeac64 g_log (libglib-2.0.so.0 + 0x5ec64)
                                                  #2  0x0000555e67d7a39d n/a (dunst + 0x1439d)
                                                  #3  0x0000555e67d738c1 n/a (dunst + 0xd8c1)
                                                  #4  0x00007efd5fca9850 n/a (libc.so.6 + 0x23850)
                                                  #5  0x00007efd5fca990a __libc_start_main (libc.so.6 + 0x2390a)
                                                  #6  0x0000555e67d73ce5 n/a (dunst + 0xdce5)

                                                  Stack trace of thread 3074578:
                                                  #0  0x00007efd5fd892ed syscall (libc.so.6 + 0x1032ed)
                                                  #1  0x00007efd5ff3c7b5 g_cond_wait (libglib-2.0.so.0 + 0xb07b5)
                                                  #2  0x00007efd5feb0fb4 n/a (libglib-2.0.so.0 + 0x24fb4)
                                                  #3  0x00007efd5ff17f9e n/a (libglib-2.0.so.0 + 0x8bf9e)
                                                  #4  0x00007efd5ff13315 n/a (libglib-2.0.so.0 + 0x87315)
                                                  #5  0x00007efd5fd0d44b n/a (libc.so.6 + 0x8744b)
                                                  #6  0x00007efd5fd90e40 n/a (libc.so.6 + 0x10ae40)

                                                  Stack trace of thread 3074581:
                                                  #0  0x00007efd5fd83c0f __poll (libc.so.6 + 0xfdc0f)
                                                  #1  0x00007efd5ff4317f n/a (libglib-2.0.so.0 + 0xb717f)
                                                  #2  0x00007efd5fee5c7f g_main_loop_run (libglib-2.0.so.0 + 0x59c7f)
                                                  #3  0x00007efd60148d3c n/a (libgio-2.0.so.0 + 0x10ed3c)
                                                  #4  0x00007efd5ff13315 n/a (libglib-2.0.so.0 + 0x87315)
                                                  #5  0x00007efd5fd0d44b n/a (libc.so.6 + 0x8744b)
                                                  #6  0x00007efd5fd90e40 n/a (libc.so.6 + 0x10ae40)

                                                  Stack trace of thread 3074580:
                                                  #0  0x00007efd5fd83c0f __poll (libc.so.6 + 0xfdc0f)
                                                  #1  0x00007efd5ff4317f n/a (libglib-2.0.so.0 + 0xb717f)
                                                  #2  0x00007efd5fee5c7f g_main_loop_run (libglib-2.0.so.0 + 0x59c7f)
                                                  #3  0x00007efd601386bc g_dbus_connection_send_message_with_reply_sync (libgio-2.0.so.0 + 0xfe6bc)
                                                  #4  0x00007efd60144837 n/a (libgio-2.0.so.0 + 0x10a837)
                                                  #5  0x00007efd60144aa7 g_dbus_connection_call_sync (libgio-2.0.so.0 + 0x10aaa7)
                                                  #6  0x00007efd60139150 n/a (libgio-2.0.so.0 + 0xff150)
                                                  #7  0x00007efd6007bbd3 n/a (libgio-2.0.so.0 + 0x41bd3)
                                                  #8  0x00007efd600e621c n/a (libgio-2.0.so.0 + 0xac21c)
                                                  #9  0x00007efd5ff189a3 n/a (libglib-2.0.so.0 + 0x8c9a3)
                                                  #10 0x00007efd5ff13315 n/a (libglib-2.0.so.0 + 0x87315)
                                                  #11 0x00007efd5fd0d44b n/a (libc.so.6 + 0x8744b)
                                                  #12 0x00007efd5fd90e40 n/a (libc.so.6 + 0x10ae40)

                                                  Stack trace of thread 3074579:
                                                  #0  0x00007efd5fd83c0f __poll (libc.so.6 + 0xfdc0f)
                                                  #1  0x00007efd5ff4317f n/a (libglib-2.0.so.0 + 0xb717f)
                                                  #2  0x00007efd5fee51a2 g_main_context_iteration (libglib-2.0.so.0 + 0x591a2)
                                                  #3  0x00007efd5fee51f2 n/a (libglib-2.0.so.0 + 0x591f2)
                                                  #4  0x00007efd5ff13315 n/a (libglib-2.0.so.0 + 0x87315)
                                                  #5  0x00007efd5fd0d44b n/a (libc.so.6 + 0x8744b)
                                                  #6  0x00007efd5fd90e40 n/a (libc.so.6 + 0x10ae40)
                                                  ELF object binary architecture: AMD x86-64
May 23 19:49:46 $HOSTNAME systemd[18642]: dunst.service: Main process exited, code=dumped, status=5/TRAP
May 23 19:49:46 $HOSTNAME systemd[18642]: dunst.service: Failed with result 'core-dump'.
May 23 19:49:46 $HOSTNAME systemd[1]: systemd-coredump@4-3074582-0.service: Deactivated successfully.
May 23 19:49:46 $HOSTNAME systemd[18642]: Failed to start Dunst notification daemon.
bynect commented 9 months ago

the core dump is a direct consequence of calling LOG_E(). My question now is, how should we handle this gracefully? Exit with a message? Or wait for a x screen to be set up?

nPHYN1T3 commented 9 months ago

Not sure what you mean by "wait for an XScreen to be set up" when there are already many. In may case it's not because I'm running Wayland, in my case it's because it's assuming to run on XScreen 0.0 but then called on 0.1 or 0.2. My use case is slowly being killed by the "Wayland future proofing" but Waylands design (at present) breaks by design my hardware config.

However I'd be more likely to vote on exit with a message on start. Waiting for some requirement gives the false illusion things are functioning and the user may continue one assuming things will work.

bynect commented 9 months ago

Not sure what you mean by "wait for an XScreen to be set up" when there are already many. In may case it's not because I'm running Wayland, in my case it's because it's assuming to run on XScreen 0.0 but then called on 0.1 or 0.2. My use case is slowly being killed by the "Wayland future proofing" but Waylands design (at present) breaks by design my hardware config.

sorry, I meant that if the x screen is not found to wait for one

However I'd be more likely to vote on exit with a message on start. Waiting for some requirement gives the false illusion things are functioning and the user may continue one assuming things will work.

Yes you are totally right and I think this is the right course of action

Maybe the solution could be as easy as changing the LOG_E with DIE (which exits instead of abort)

nPHYN1T3 commented 9 months ago

I'm guessing from this Dunst will simply not work for me going forward then since it will by design bail rather than just run on the XScreen it's being called from?

bynect commented 9 months ago

I'm guessing from this Dunst will simply not work for me going forward then since it will by design bail rather than just run on the XScreen it's being called from?

If it works now, it will work in the future. I am only talking about using exit instead of abort when reporting the error, so the behavior is not changed

nPHYN1T3 commented 9 months ago

It doesn't work now which is why I have my above report. Well it works but ONLY on one XScreen. If anything calls it from another XScreen it just dies.

May 23 19:49:46 $HOSTNAME dunst[3074577]: WARNING: Cannot open X11 display.
May 23 19:49:46 $HOSTNAME dunst[3074577]: ERROR: [  get_x11_output:0065] Couldn't initialize X11 output. Aborting...
May 23 19:49:46 $HOSTNAME kernel: traps: dunst[3074577] trap int3 ip:7efd5feea9a5 sp:7fff048ff950 error:0 in libglib-2.0.so.0.7600.2[7efd5feaa000+9d000]
May 23 19:49:46 $HOSTNAME systemd[1]: Started Process Core Dump (PID 3074582/UID 0).
May 23 19:49:46 $HOSTNAME systemd-coredump[3074583]: [🡕] Process 3074577 (dunst) of user 1000 dumped core.

***Wait, I semi retract this! Kinda...maybe...let me explain.

I just did a test since this bug report is pretty old. Seems something has "half" fixed it. So if it's called with an explicitly defined XScreen it no longer crashes (which is good) but dunst can still only run on and notify on one XScreen.

bynect commented 9 months ago

It doesn't work now which is why I have my above report. Well it works but ONLY on one XScreen. If anything calls it from another XScreen it just dies.

This is the code that opens the display (using xlib) and the only thing where it could fail to initialize like you see.

        if (!(xctx.dpy = XOpenDisplay(NULL))) {
                LOG_W("Cannot open X11 display.");
                return false;
        }

this is an excerpt from the xlib documentation

On a POSIX-conformant system, if the display_name is NULL, it defaults to the value of the DISPLAY environment variable

Thus, dunst is simply trying to open the default display with XOpenDisplay and a NULL. This will open the DISPLAY env var. Thus theoretically if you defined DISPLAY correctly this could be solved

I saw the edit. The reason that dunst can run only on one display is at the moment by design. Because we open a single X connection/display. to run on multiple display would mean changing a lot of the X11 code. So if you want to switch between displays I think you have to kill dunst and reopen it with the correct DISPLAY set. Which is annoying but dunst can be simply reloaded by killing it

nPHYN1T3 commented 9 months ago

Gotcha, I'm just glad it doesn't crash when called from a different XScreen any longer. Back when I chimed in on this bug report if I launched dunst on say XScreen 0.1 but then an application on XScreen 0.2 did a notification blam, crash. Testing today I see calls actually work. I had removed a lot of notification stuff from my scripts and tried to silence applications that used it because it just killed the dunst server all the time and only allowed things I knew wouldn't cause the crash.

bynect commented 9 months ago

Gotcha, I'm just glad it doesn't crash when called from a different XScreen any longer. Back when I chimed in on this bug report if I launched dunst on say XScreen 0.1 but then an application on XScreen 0.2 did a notification blam, crash. Testing today I see calls actually work. I had removed a lot of notification stuff from my scripts and tried to silence applications that used it because it just killed the dunst server all the time and only allowed things I knew wouldn't cause the crash.

Well, I'm glad you solved in the end. If during the refactoring of the x11 backend I can add multiple server support I might do so, who knows 😁

nPHYN1T3 commented 9 months ago

Heh well I know myself a few others would probably dig that but seems nuts like me are a dying breed. Maybe XCB would make that easier? (if the documentation doesn't make it harder hah) Running Multi GPU that is. Or perhaps I should say most multi GPU setups these days are one for output the others for ML / Cuda workloads so they don't care if they are X or Wayland. For those like me who run several screens per GPU Wayland is a death knell unless they fundamentally change how multi GPU works.

bynect commented 9 months ago

Heh well I know myself a few others would probably dig that but seems nuts like me are a dying breed. Maybe XCB would make that easier? (if the documentation doesn't make it harder hah) Running Multi GPU that is. Or perhaps I should say most multi GPU setups these days are one for output the others for ML / Cuda workloads so they don't care if they are X or Wayland. For those like me who run several screens per GPU Wayland is a death knell unless they fundamentally change how multi GPU works.

Yes, using multiple xorg sessions on the same gpu is not something you hear frequently. I am kinda curios, would you mind sharing why/what advantages it has?

Also now that you said xcb, I actually worked with it on another project. so maybe instead of rewriting the xlib backend I can make a xcb backend 🤔

nPHYN1T3 commented 9 months ago

I'm not running multiple xorg instances/seats, just a single xorg instance with each GPU given their own XScreen.

If I try to TL;DR the "advantage" of this, giving each GPU/"device" their own XScreen ensures every GPU gets 100% of their performance/resources and applications can be denoted to work on that single GPU and its screens.

The mid version is rather literally every other configuration option is terrible for multiGPU. GPU's sit at idle around 50% with Xinerama or xrandr provider setups. High heat/power waste, worthless performance. Wayland actually makes things the worst. (Mayish 2023 there was some news about this being addressed but updates have been sparse.) All other configs however work around lumping things together which ends up with a "lowest common denominator" issue for unmatched GPU's and even when matched there is horrendous overhead.

rickalex21 commented 7 months ago

Same here:

-❯ echo $WAYLAND_DISPLAY wayland-1

Linux nixos 6.6.26 1-NixOS SMP PREEMPT_DYNAMIC Wed Apr 10 14:36:08 UTC 2024 x86_64 GNU/Linux

-❯ dunst --version Dunst - A customizable and lightweight notification-daemon 1.10.0

bynect commented 7 months ago

Same here:

-❯ echo $WAYLAND_DISPLAY wayland-1

Linux nixos 6.6.26 1-NixOS SMP PREEMPT_DYNAMIC Wed Apr 10 14:36:08 UTC 2024 x86_64 GNU/Linux

-❯ dunst --version Dunst - A customizable and lightweight notification-daemon 1.10.0

Wait, are you trying to run dunst on wayland and it crashes because there is no X server??

rickalex21 commented 7 months ago

@bynect Yea I was getting the same errors as mentioned above. It's working fine now. I'm fairly new to nixos so I'm not sure how dunst is starting considering it's not enabled as a user service in my configuration.nix, I have no scripts that start it, and nothing in .config/systemd/user.

bynect commented 7 months ago

@bynect Yea I was getting the same errors as mentioned above. It's working fine now. I'm fairly new to nixos so I'm not sure how dunst is starting considering it's not enabled as a user service in my configuration.nix, I have no scripts that start it, and nothing in .config/systemd/user.

I thought that this was fixed before, I will check again

rickalex21 commented 7 months ago

UPDATE: So the fix is to add Environment=DISPLAY=:0 in the Service section. This was also mentioned here https://github.com/NixOS/nixpkgs/issues/154318. I am not sure if it's an issue related to dunst.

The only issue now is I get this when running:

journalctl --user -xeu dunst.service
Xlib:  extension "MIT-SCREEN-SAVER" missing on display ":0".

More than likely dunst would have to be built without X11 support if you're using wayland.

Here is the full working script for nixos using a window manager.

[Unit]
Description=Dunst notification daemon
Documentation=man:dunst(1)
PartOf=default.target

[Service]
# 04/17/2024 DISPLAY added for nixos. https://github.com/dunst-project/dunst/issues/1095#issuecomment-2061554769
Environment=DISPLAY=:0
Type=dbus
BusName=org.freedesktop.Notifications
ExecStart=/nix/store/f40mb54130x37xfkiqcilz0ylhd1klm4-dunst-1.10.0/bin/dunst

[Install]
WantedBy=default.target

You may also need to do this in your window manager startup:

dbus-update-activation-environment DISPLAY WAYLAND_DISPLAY 
bynect commented 7 months ago

UPDATE: So the fix is to add Environment=DISPLAY=:0 in the Service section. This was also mentioned here NixOS/nixpkgs#154318. I am not sure if it's an issue related to dunst.

The only issue now is I get this when running:

journalctl --user -xeu dunst.service
Xlib:  extension "MIT-SCREEN-SAVER" missing on display ":0".

More than likely dunst would have to be built without X11 support if you're using wayland.

Here is the full working script for nixos using a window manager.

[Unit]
Description=Dunst notification daemon
Documentation=man:dunst(1)
PartOf=default.target

[Service]
# 04/17/2024 DISPLAY added for nixos. https://github.com/dunst-project/dunst/issues/1095#issuecomment-2061554769
Environment=DISPLAY=:0
Type=dbus
BusName=org.freedesktop.Notifications
ExecStart=/nix/store/f40mb54130x37xfkiqcilz0ylhd1klm4-dunst-1.10.0/bin/dunst

[Install]
WantedBy=default.target

You may also need to do this in your window manager startup:

dbus-update-activation-environment DISPLAY WAYLAND_DISPLAY 

Dunst should be able to understand that it is running on wayland by looking at the WAYLAND_DISPLAY even if it was built with xorg support.

rickalex21 commented 7 months ago

@bynect I'm getting wayland-1 is there anything you can suggest to fix this?

bynect commented 7 months ago

@bynect I'm getting wayland-1 is there anything you can suggest to fix this?

I am not sure what you mean(?)

Anyhow, dunst checks the WAYLAND_DISPLAY env var and if it is defined tries to use the wayland output (if it was compiled). here is the source

It should not even try to use xorg on a wayland environment (unless force_xwayland is set to true).

rickalex21 commented 7 months ago

I am not sure what you mean(?)

I meant the output of echo $WAYLAND_DISPLAY is wayland-1.

I think it's probably not a big deal since it's a screen saver issue, my main problem was resolved.

Apr 17 11:36:38 nixos dunst[171679]: Xlib:  extension "MIT-SCREEN-SAVER" missing on display ":0".

I'm not familar with the nixos build of dunst but this link may be of interest. This is what comes by default and what I have: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/by-name/du/dunst/package.nix

bynect commented 7 months ago

I am not sure what you mean(?)

I meant the output of echo $WAYLAND_DISPLAY is wayland-1.

I think it's probably not a big deal since it's a screen saver issue, my main problem was resolved.

Apr 17 11:36:38 nixos dunst[171679]: Xlib:  extension "MIT-SCREEN-SAVER" missing on display ":0".

I'm not familar with the nixos build of dunst but this link may be of interest. This is what comes by default and what I have: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/by-name/du/dunst/package.nix

I think that you should not export DISPLAY which is xorg specific when using wayland. I am not too sure what codepath dunst is taking at this point but it should not use Xorg things at all. This message is coming from some library trying to use xorg when there is no xorg.

What happens if you pass only WAYLAND_DISPLAY?

rickalex21 commented 7 months ago

@bynect I think the solution would be to build dunst without XORG support but not sure I'm up for that right now since I'm fairly new to nixos.

I don't know what you mean by: What happens if you pass only WAYLAND_DISPLAY?

I started getting errors again but I just restart systemctl --user restart dunst.service and they go away. The only thing that remains is

Apr 18 14:29:35 nixos systemd[1406]: Starting Dunst notification daemon...
Apr 18 14:29:35 nixos dunst[5176]: Xlib:  extension "MIT-SCREEN-SAVER" missing on display ":0".
Apr 18 14:29:35 nixos systemd[1406]: Started Dunst notification daemon.

Which is not a big deal.

bynect commented 7 months ago

@bynect I think the solution would be to build dunst without XORG support but not sure I'm up for that right now since I'm fairly new to nixos.

I don't know what you mean by: What happens if you pass only WAYLAND_DISPLAY?

I started getting errors again but I just restart systemctl --user restart dunst.service and they go away. The only thing that remains is

Apr 18 14:29:35 nixos systemd[1406]: Starting Dunst notification daemon...
Apr 18 14:29:35 nixos dunst[5176]: Xlib:  extension "MIT-SCREEN-SAVER" missing on display ":0".
Apr 18 14:29:35 nixos systemd[1406]: Started Dunst notification daemon.

Which is not a big deal.

Building dunst without Xorg support is a new feature (added in the latest release). The solution should be something else, since dunst should work with both compiled.

Anyway, the DISPLAY env var is for Xorg and WAYLAND_DISPLAY is for wayland. Having both may lead to problems, especially if one is not used. This is why I asked what happens if you don't pass the DISPLAY var at all.

Apr 18 14:29:35 nixos systemd[1406]: Starting Dunst notification daemon... Apr 18 14:29:35 nixos dunst[5176]: Xlib: extension "MIT-SCREEN-SAVER" missing on display ":0". Apr 18 14:29:35 nixos systemd[1406]: Started Dunst notification daemon.

I suppose that that warning is due to the fact that despite passing the DISPLAY var no Xorg server is running.

But I am not too sure how nixos manages environments.

rickalex21 commented 7 months ago

@bynect I took it off from my river init but it does not make a difference. That was the only location where I had DISPLAY at. I only have a single source of truth in ~/.config/fish and it's not there. I looked at env, I think a program I added in nixos is creating it somewhere...

rickalex21 commented 15 hours ago

UPDATE: I did get rid of the errors with "Environment="DISPLAY=:0"

However, now I am getting this:

Unable to send notification: Error calling StartServiceByName for org.freedesktop.Notifications: Process org.freedesktop.Notifications received signal 5

Funny thing about it running "dusnt" directly from the terminal works. More than likely cause I have all my variables in the shell.

FIX: I ended up starting dunst manually instead of with systemd. Perhaps there's something wrong with my config? I am using nixos by the way.

[Unit]
Description=Dunst notification daemon
Documentation=man:dunst(1)
PartOf=default.target

[Service]
Environment="DISPLAY=:0"
Type=dbus
BusName=org.freedesktop.Notifications
ExecStart=/run/current-system/sw/bin/dunst

[Install]
WantedBy=default.target