NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.58k stars 13.73k forks source link

display-manager.service restarts after nixos-rebuild switch #18390

Closed rasendubi closed 7 years ago

rasendubi commented 8 years ago

Issue description

display-manager.service restarts after nixos-rebuild switch. This doesn't happen immediately but rather some time after, when I'm AFK. I'm experiencing this for more than a month.

Last time this happened after upgrading from c3d8b3f679cba37c9c820934b2adbfbb64a03431 to 202d9e29c3aeea336f9caf22285de92d120d9b7c. The reload happened in ~7 hours after nixos-rebuild and ~5 hours was away.

Sep 07 11:00:00 Larry slim[1006]: pam_unix(slim:session): session closed for user rasen
Sep 07 11:00:01 Larry audit[26410]: ANOM_ABEND auid=1000 uid=1000 gid=100 ses=1 pid=26410 comm="emacs" exe="/nix/store/d53n3y65jj1vwjp2wcac3gn7x7z2f2wc-emacs-24.5/bin/emacs-24.5" sig=6
Sep 07 11:00:01 Larry kernel: audit: type=1701 audit(1473235201.011:9521): auid=1000 uid=1000 gid=100 ses=1 pid=26410 comm="emacs" exe="/nix/store/d53n3y65jj1vwjp2wcac3gn7x7z2f2wc-emacs-24.5/bin/emacs-24.5" sig=6
Sep 07 11:00:01 Larry audit[1138]: ANOM_ABEND auid=1000 uid=1000 gid=100 ses=1 pid=1138 comm="xss-lock" exe="/nix/store/fpgqs8d4dizrq3j93k4vppzcglhhkykw-xss-lock-git-2014-03-02/bin/xss-lock" sig=5
Sep 07 11:00:01 Larry kernel: traps: xss-lock[1138] trap int3 ip:7f507870f023 sp:7ffe6a4768f0 error:0
Sep 07 11:00:01 Larry kernel: audit: type=1701 audit(1473235201.044:9522): auid=1000 uid=1000 gid=100 ses=1 pid=1138 comm="xss-lock" exe="/nix/store/fpgqs8d4dizrq3j93k4vppzcglhhkykw-xss-lock-git-2014-03-02/bin/xss-lock" sig=5
Sep 07 11:00:01 Larry audit[4951]: ANOM_ABEND auid=1000 uid=1000 gid=100 ses=1 pid=4951 comm="skype" exe="/nix/store/a8n2imm1gpnv4p702xb9356pnhsifrqp-skype-4.3.0.37/libexec/skype/skype" sig=6
Sep 07 11:00:01 Larry kernel: audit: type=1701 audit(1473235201.649:9523): auid=1000 uid=1000 gid=100 ses=1 pid=4951 comm="skype" exe="/nix/store/a8n2imm1gpnv4p702xb9356pnhsifrqp-skype-4.3.0.37/libexec/skype/skype" sig=6
Sep 07 11:00:02 Larry audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry kernel: audit: type=1130 audit(1473235202.050:9524): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry kernel: audit: type=1131 audit(1473235202.050:9525): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry systemd[1]: display-manager.service: Service hold-off time over, scheduling restart.
Sep 07 11:00:02 Larry audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry systemd[1]: Stopped X11 Server.
Sep 07 11:00:02 Larry kernel: audit: type=1130 audit(1473235202.263:9526): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry kernel: audit: type=1131 audit(1473235202.263:9527): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry systemd[1]: Starting X11 Server...
Sep 07 11:00:02 Larry systemd[1]: Started X11 Server.
Sep 07 11:00:02 Larry audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry kernel: audit: type=1130 audit(1473235202.279:9528): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=display-manager comm="systemd" exe="/nix/store/lamh95q185ymigm217s64x08qcmnfhvm-systemd-231/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 07 11:00:02 Larry display-manager[29452]: /nix/store/xssx8irkfbzh1clm1zlkry0ww0z4zklf-xsession 'none + awesome' - none + awesome
Sep 07 11:00:02 Larry display-manager[29452]: /nix/store/xssx8irkfbzh1clm1zlkry0ww0z4zklf-xsession 'none + awesome' - none + awesome
Sep 07 11:00:02 Larry display-manager[29452]: /nix/store/qd3mcm6x4jj66jx10ni8g0fr92983dm8-xauth-1.0.9/bin/xauth:  file /var/run/slim.auth does not exist
Sep 07 11:00:02 Larry display-manager[29452]: X.Org X Server 1.18.3
Sep 07 11:00:02 Larry display-manager[29452]: Release Date: 2016-04-04
...

Steps to reproduce

  1. nixos-rebuild switch
  2. wait

    Technical details

    • System: 16.09.git.202d9e2 (Flounder)
    • Nix version: nix-env (Nix) 1.11.3
    • Nixpkgs version: 17.03.git.202d9e2
domenkozar commented 8 years ago

display-manager.service: Service hold-off time over, scheduling restart., huh.

domenkozar commented 8 years ago

Did you check if your display-manager failed somehow? This usually suggests that it died and systemd just started it back.

rasendubi commented 8 years ago

I've examined the logs (journalctl, X.log.old, slim logs) and haven't found any evidence of failure. (Though, that doesn't mean there was no failure.)

rasendubi commented 8 years ago

I should also note I've never experienced reload or any display manager failure without nixos-rebuild.

vcunat commented 8 years ago

It sounds like some new clever systemd feature :-D

vcunat commented 8 years ago

Seriously now; @rasendubi: are you sure you didn't have any failures of other services in the meantime, e.g. dbus?

groxxda commented 8 years ago

If there was an error, it should have happened before the Sep 07 11:00:00 Larry slim[1006]: pam_unix(slim:session): session closed for user rasen.. I'm surprised this happend exactly at 11:00:00, can you see a pattern for the time (always full hour or even always 11:00)? Also it may be interesting if systemd status reports failed units or queued jobs after nixos-rebuild before this happens.

rasendubi commented 8 years ago

@vcunat I can't be sure other services haven't failed -- I've been AFK for all night, but all logs are clean.

@groxxda there is nothing strange before Sep 07 11:00:00 Larry slim[1006]: pam_unix(slim:session): session closed for user rasen, just a zillion of chrome audit messages.

whoa... that's strange

Jul 15 10:00:00 Larry slim[955]: pam_unix(slim:session): session closed for user rasen
...
Aug 18 11:00:00 Larry slim[1022]: pam_unix(slim:session): session closed for user rasen
...
Sep 07 11:00:00 Larry slim[1006]: pam_unix(slim:session): session closed for user rasen

I've done nixos-rebuild switches to mentioned commits back and forth, systemctl status reports no queued or failed jobs. (Though, I'm not sure display-manager is going to be restarted.)

groxxda commented 8 years ago

It does not happen after every rebuild, right? Is there anything unusual in systemctl | grep timer?

rasendubi commented 8 years ago

No, it does not. Only after "big" rebuilds.

rasen@Larry ~/d/c/nixpkgs> sudo systemctl list-timers --all
NEXT                          LEFT     LAST                          PASSED    UNIT                         ACTIVATES
Thu 2016-09-08 22:32:29 EEST  23h left Wed 2016-09-07 22:32:29 EEST  12min ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

1 timers listed.

(nothing unusual)

rasendubi commented 8 years ago

This might be caused by my alarm clock (alarm-clock-applet). 10:00-11:00 is usually when I (should) wake up.

rasendubi commented 8 years ago

Happened today again during the switching to 7e273d9209c6c96acce0f3c4abff3b7ec18ca360. The issue is 100% is caused by alarm-clock-applet.

vcunat commented 8 years ago

Apparently something it does triggers the restart – the sound or the notification etc. It's still strange that it happens without anything reporting any error.

rasendubi commented 8 years ago

I've started alarm-clock-applet from a tty with DISPLAY=:0.0 and got some logs:

** (alarm-clock-applet:7217): WARNING **: Failed to send notification: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying
alarm-clock-applet: Fatal IO error 104 (Connection reset by peer) on X server :0.0.
rasendubi commented 8 years ago

output of dbus-monitor:

method call time=1473801704.705923 sender=:1.24 -> destination=org.freedesktop.DBus serial=1 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=Hello
method return time=1473801704.705937 sender=org.freedesktop.DBus -> destination=:1.24 serial=1 reply_serial=1
   string ":1.24"
signal time=1473801704.705944 sender=org.freedesktop.DBus -> destination=(null destination) serial=53 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.24"
   string ""
   string ":1.24"
signal time=1473801704.705951 sender=org.freedesktop.DBus -> destination=:1.24 serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameAcquired
   string ":1.24"
method call time=1473801704.706152 sender=:1.24 -> destination=org.freedesktop.DBus serial=2 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',sender='org.freedesktop.Notifications',interface='org.freedesktop.Notifications',path='/org/freedesktop/Notifications'"
method return time=1473801704.706161 sender=org.freedesktop.DBus -> destination=:1.24 serial=3 reply_serial=2
method call time=1473801704.706197 sender=:1.24 -> destination=org.freedesktop.DBus serial=3 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=AddMatch
   string "type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',path='/org/freedesktop/DBus',arg0='org.freedesktop.Notifications'"
method return time=1473801704.706205 sender=org.freedesktop.DBus -> destination=:1.24 serial=4 reply_serial=3
method call time=1473801704.706532 sender=:1.24 -> destination=org.freedesktop.DBus serial=4 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=StartServiceByName
   string "org.freedesktop.Notifications"
   uint32 0
error time=1473801704.706549 sender=org.freedesktop.DBus -> destination=:1.24 error_name=org.freedesktop.DBus.Error.ServiceUnknown reply_serial=4
   string "The name org.freedesktop.Notifications was not provided by any .service files"
method call time=1473801704.706679 sender=:1.24 -> destination=org.freedesktop.DBus serial=5 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=GetNameOwner
   string "org.freedesktop.Notifications"
method return time=1473801704.706690 sender=org.freedesktop.DBus -> destination=:1.24 serial=6 reply_serial=5
   string ":1.0"
method call time=1473801704.706826 sender=:1.24 -> destination=:1.0 serial=6 path=/org/freedesktop/Notifications; interface=org.freedesktop.Notifications; member=GetServerInformation
method return time=1473801704.706960 sender=:1.0 -> destination=:1.24 serial=4 reply_serial=6
   string "naughty"
   string "awesome"
   string "3.5"
   string "1.0"
method call time=1473801704.707160 sender=:1.24 -> destination=:1.0 serial=7 path=/org/freedesktop/Notifications; interface=org.freedesktop.Notifications; member=Notify
   string "alarm-clock"
   uint32 0
   string "alarm-clock"
   string "Alarm!"
   string "You can snooze or stop alarms from the Alarm Clock menu."
   array [
   ]
   array [
   ]
   int32 -1
signal time=1473801704.709461 sender=org.freedesktop.DBus -> destination=:1.0 serial=6 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string "org.freedesktop.Notifications"
signal time=1473801704.709482 sender=org.freedesktop.DBus -> destination=(null destination) serial=7 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string "org.freedesktop.Notifications"
   string ":1.0"
   string ""
signal time=1473801704.709656 sender=org.freedesktop.DBus -> destination=:1.0 serial=7 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string "org.naquadah.awesome.awful"
signal time=1473801704.709664 sender=org.freedesktop.DBus -> destination=(null destination) serial=54 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string "org.naquadah.awesome.awful"
   string ":1.0"
   string ""
signal time=1473801704.709669 sender=org.freedesktop.DBus -> destination=:1.0 serial=8 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string ":1.0"
signal time=1473801704.709672 sender=org.freedesktop.DBus -> destination=(null destination) serial=55 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.0"
   string ":1.0"
   string ""
error time=1473801704.709676 sender=org.freedesktop.DBus -> destination=:1.24 error_name=org.freedesktop.DBus.Error.NoReply reply_serial=7
   string "Message recipient disconnected from message bus without replying"
signal time=1473801704.714984 sender=org.freedesktop.DBus -> destination=:1.24 serial=9 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string ":1.24"
signal time=1473801704.715005 sender=org.freedesktop.DBus -> destination=(null destination) serial=56 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.24"
   string ":1.24"
   string ""
signal time=1473801704.715081 sender=org.freedesktop.DBus -> destination=:1.21 serial=10 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string "com.pseudoberries.AlarmClock"
signal time=1473801704.715087 sender=org.freedesktop.DBus -> destination=(null destination) serial=57 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string "com.pseudoberries.AlarmClock"
   string ":1.21"
   string ""
signal time=1473801704.715148 sender=org.freedesktop.DBus -> destination=:1.21 serial=11 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameLost
   string ":1.21"
signal time=1473801704.715156 sender=org.freedesktop.DBus -> destination=(null destination) serial=58 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged
   string ":1.21"
   string ":1.21"
   string ""
signal time=1473801704.715419 sender=(null sender) -> destination=(null destination) serial=0 path=/org/freedesktop/DBus/Local; interface=org.freedesktop.DBus.Local; member=Disconnected
rasendubi commented 8 years ago

This looks like a silent awesome wm crash. I've looked at ~/.awesome-errors, X logs, journalctl and still have no clue why this happens.

I've also tried to reproduce the issue by sending a notification with notify-send with no luck.

Also tried reloading/restarting dbus -- that doesn't crash awesome either.

groxxda commented 8 years ago

I'm not sure how the dbus stuff works at the moment, but it might be that dbus (user session) is restarted when you update your system and the applet still has an invalid handle to the previous socket or something like that.. Can you try with #18382 ?

rasendubi commented 7 years ago

The issue didn't occur while switching to the latest nixpkgs-unstable (e3deccc0d7a42b0ae1d5e337cd6b0b9488c2295a), even though update was substantial:

stopping the following units: alsa-store.service, audit.service, avahi-daemon.service, bbswitch.service, cpufreq.service, dnsmasq.service, gitolite-init.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.service, nix-daemon.socket, nscd.service, ntpd.service, openvpn-kaa.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-vconsole-setup.service, udisks2.service, wicd.service
NOT restarting the following changed units: display-manager.service, getty@tty1.service, getty@tty2.service, getty@tty3.service, getty@tty4.service, getty@tty5.service, systemd-backlight@backlight:acpi_video0.service, systemd-backlight@backlight:intel_backlight.service, systemd-backlight@leds:dell::kbd_backlight.service, systemd-journal-flush.service, systemd-random-seed.service, systemd-remount-fs.service, systemd-tmpfiles-setup.service, systemd-update-utmp.service, systemd-user-sessions.service, user@0.service, user@1000.service
activating the configuration...
setting up /etc...
restarting systemd...
reloading the following units: dbus.service, dev-hugepages.mount, dev-mqueue.mount, firewall.service, sys-kernel-debug.mount
restarting the following units: polkit.service, sshd.service, systemd-journald.service, systemd-logind.service
starting the following units: alsa-store.service, audit.service, avahi-daemon.service, bbswitch.service, cpufreq.service, dnsmasq.service, gitolite-init.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.socket, nscd.service, ntpd.service, openvpn-kaa.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-vconsole-setup.service, udisks2.service, wicd.service
rasendubi commented 7 years ago

Still doesn't happen.

stopping the following units: alsa-store.service, audit.service, bbswitch.service, cpufreq.service, dnsmasq.service, gitolite-init.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.service, nix-daemon.socket, nscd.service, ntpd.service, openvpn-kaa.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-udevd.service, systemd-vconsole-setup.service, udisks2.service, wicd.service
NOT restarting the following changed units: display-manager.service, getty@tty1.service, getty@tty2.service, getty@tty3.service, getty@tty4.service, getty@tty5.service, systemd-backlight@backlight:acpi_video0.service, systemd-backlight@backlight:intel_backlight.service, systemd-backlight@leds:dell::kbd_backlight.service, systemd-journal-flush.service, systemd-random-seed.service, systemd-remount-fs.service, systemd-tmpfiles-setup.service, systemd-update-utmp.service, systemd-user-sessions.service, user@1000.service
activating the configuration...
setting up /etc...
restarting systemd...
reloading the following units: dbus.service, dev-hugepages.mount, dev-mqueue.mount, firewall.service, sys-kernel-debug.mount
restarting the following units: polkit.service, sshd.service, systemd-journald.service, systemd-logind.service
starting the following units: alsa-store.service, audit.service, bbswitch.service, cpufreq.service, dnsmasq.service, gitolite-init.service, kmod-static-nodes.service, network-local-commands.service, network-setup.service, nix-daemon.socket, nscd.service, ntpd.service, openvpn-kaa.service, systemd-modules-load.service, systemd-sysctl.service, systemd-tmpfiles-clean.timer, systemd-tmpfiles-setup-dev.service, systemd-udev-trigger.service, systemd-udevd-control.socket, systemd-udevd-kernel.socket, systemd-vconsole-setup.service, udisks2.service, wicd.service
the following new units were started: avahi-daemon.service, avahi-daemon.socket

I'm closing as I can't reproduce anymore. Will reopen if this happens again.