MusicPlayerDaemon / MPD

Music Player Daemon
https://www.musicpd.org/
GNU General Public License v2.0
2.18k stars 352 forks source link

mpd times out when auto-started as a systemd --user unit #259

Closed FichteFoll closed 6 years ago

FichteFoll commented 6 years ago

I'm running Arch Linux (and mpd 0.20.18) and since approximately two weeks ago, the mpd systemd user unit consistently times out when I boot.

~ Λ journalctl --user -u mpd.service -f -b0
-- Logs begin at Thu 2017-01-05 20:38:55 CET. --
Apr 03 10:48:11 pinaceae systemd[526]: Starting Music Player Daemon...
Apr 03 10:48:11 pinaceae mpd[532]: exception: bind to '0.0.0.0:6600' failed (continuing anyway, because binding to '[::]:6600' succeeded): Failed to bind socket: Address already in use
Apr 03 10:48:11 pinaceae mpd[532]: libsamplerate: libsamplerate converter 'Fastest Sinc Interpolator'
Apr 03 10:48:11 pinaceae mpd[532]: vorbis: Xiph.Org libVorbis 1.3.6
Apr 03 10:48:11 pinaceae mpd[532]: opus: libopus 1.2.1
Apr 03 10:48:11 pinaceae mpd[532]: sndfile: libsndfile-1.0.28
Apr 03 10:48:11 pinaceae mpd[532]: simple_db: reading DB
Apr 03 10:48:11 pinaceae mpd[532]: curl: version 7.59.0
Apr 03 10:48:11 pinaceae mpd[532]: curl: with OpenSSL/1.1.0h
Apr 03 10:49:41 pinaceae systemd[526]: mpd.service: Start operation timed out. Terminating.
Apr 03 10:50:40 pinaceae systemd[526]: mpd.service: Failed with result 'timeout'.
Apr 03 10:50:40 pinaceae systemd[526]: Failed to start Music Player Daemon.

An strace call during this initial startup revealed that mpd is still doing work at this time and crawling my music library (indicated by lots of stat calls), after which it terminates. I suspect that startup takes too long (>30s) for systemd so it considers the unit to have failed and signals it to terminate, which mpd then does but only after it finished crawling the library.

Any following manually invocation of systemctl --user start mpd.service succeeds and strace shows epoll_pwait(6, most of the time, until I start playback.

My unit is as follows:

~ > cat /usr/lib/systemd/user/mpd.service
[Unit]
Description=Music Player Daemon
Documentation=man:mpd(1) man:mpd.conf(5)
After=network.target sound.target

[Service]
Type=notify
ExecStart=/usr/bin/mpd --no-daemon

# allow MPD to use real-time priority 50
LimitRTPRIO=50
LimitRTTIME=infinity

# disallow writing to /usr, /bin, /sbin, ...
ProtectSystem=yes

# more paranoid security settings
NoNewPrivileges=yes
ProtectKernelTunables=yes
ProtectControlGroups=yes
# AF_NETLINK is required by libsmbclient, or it will exit() .. *sigh*
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK
RestrictNamespaces=yes

# Note that "ProtectKernelModules=yes" is missing in the user unit
# because systemd 232 is unable to reduce its own capabilities
# ("Failed at step CAPABILITIES spawning /usr/bin/mpd: Operation not
# permitted")

[Install]
WantedBy=default.target
MaxKellermann commented 6 years ago

This is an Arch bug. They build MPD without libsystemd, and thus MPD is unable to send the expected notify to systemd.

MaxKellermann commented 6 years ago

See https://git.archlinux.org/svntogit/packages.git/plain/trunk/PKGBUILD?h=packages/mpd (No libsystemd dependency)

ArchangeGabriel commented 6 years ago

Opened https://bugs.archlinux.org/task/58082.

FichteFoll commented 6 years ago

The PKGBUILD was fixed rather quickly and I updated to that version, but the problem still occurs.

mpv --version ``` Music Player Daemon 0.20.18 Copyright (C) 2003-2007 Warren Dukes Copyright 2008-2017 Max Kellermann This is free software; see the source for copying conditions. There is NO warranty; not even MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Database plugins: simple proxy upnp Storage plugins: local smbclient nfs curl Neighbor plugins: smbclient upnp Decoders plugins: [mad] mp3 mp2 [vorbis] ogg oga [oggflac] ogg oga [flac] flac [opus] opus ogg oga [sndfile] wav aiff aif au snd paf iff svx sf voc w64 pvf xi htk caf sd2 [audiofile] wav au aiff aif [dsdiff] dff [dsf] dsf [faad] aac [wavpack] wv [modplug] 669 amf ams dbm dfm dsm far it med mdl mod mtm mt2 okt s3m stm ult umx xm [ffmpeg] 16sv 3g2 3gp 4xm 8svx aa3 aac ac3 adx afc aif aifc aiff al alaw amr anim apc ape asf atrac au aud avi avm2 avs bap bfi c93 cak cin cmv cpk daud dct divx dts dv dvd dxa eac3 film flac flc fli fll flx flv g726 gsm gxf iss m1v m2v m2t m2ts m4a m4b m4v mad mj2 mjpeg mjpg mka mkv mlp mm mmf mov mp+ mp1 mp2 mp3 mp4 mpc mpeg mpg mpga mpp mpu mve mvi mxf nc nsv nut nuv oga ogm ogv ogx oma ogg omg opus psp pva qcp qt r3d ra ram rl2 rm rmvb roq rpl rvc shn smk snd sol son spx str swf tak tgi tgq tgv thp ts tsp tta xa xvid uv uv2 vb vid vob voc vp6 vmd wav webm wma wmv wsaud wsvga wv wve [gme] ay gbs gym hes kss nsf nsfe sap spc vgm vgz [pcm] Filters: libsamplerate soxr Tag plugins: id3tag Output plugins: shout null fifo pipe alsa ao oss pulse jack httpd recorder Encoder plugins: null vorbis opus lame wave flac Archive plugins: [bz2] bz2 [zzip] zip Input plugins: file alsa archive curl ffmpeg smbclient nfs mms Playlist plugins: extm3u m3u pls xspf asx rss soundcloud flac cue embcue Protocols: file:// http:// https:// mms:// mmsh:// mmst:// mmsu:// gopher:// rtp:// rtsp:// rtmp:// rtmpt:// rtmps:// smb:// nfs:// alsa:// Other features: avahi epoll icu inotify ipv6 systemd tcp un ```
MaxKellermann commented 6 years ago

What, mpv --version?? Does your systemd unit start mpv? Are you really sure this is the binary started by your systemd unit? Check with strace whether MPD sends READY=1 to systemd.

FichteFoll commented 6 years ago

Eh, that's obviously meant to be mpd. Just muscle memory interfering.

I assume I should be stracing off a fresh boot/login?

MaxKellermann commented 6 years ago

strace your user systemd (with -f -v -s256) while you reproduce the problem.

FichteFoll commented 6 years ago

Ran strace manually in a terminal after determining the process's pid, so the first few seconds are missing.

systemd-mpd-strace.txt

journalctl --user log during the time (nothing surprising):

-- Logs begin at Thu 2017-01-05 20:38:55 CET. --
Apr 11 11:07:02 pinaceae systemd[510]: Starting Music Player Daemon...
Apr 11 11:07:03 pinaceae mpd[518]: exception: bind to '0.0.0.0:6600' failed (continuing anyway, because binding to '[::]:6600' succeeded): Failed to bind socket: Address already in use
Apr 11 11:07:03 pinaceae mpd[518]: libsamplerate: libsamplerate converter 'Fastest Sinc Interpolator'
Apr 11 11:07:03 pinaceae mpd[518]: vorbis: Xiph.Org libVorbis 1.3.6
Apr 11 11:07:03 pinaceae mpd[518]: opus: libopus 1.2.1
Apr 11 11:07:03 pinaceae mpd[518]: sndfile: libsndfile-1.0.28
Apr 11 11:07:03 pinaceae mpd[518]: simple_db: reading DB
Apr 11 11:07:03 pinaceae mpd[518]: curl: version 7.59.0
Apr 11 11:07:03 pinaceae mpd[518]: curl: with OpenSSL/1.1.0h
Apr 11 11:08:32 pinaceae systemd[510]: mpd.service: Start operation timed out. Terminating.
Apr 11 11:08:37 pinaceae systemd[510]: mpd.service: Failed with result 'timeout'.
Apr 11 11:08:37 pinaceae systemd[510]: Failed to start Music Player Daemon.
MaxKellermann commented 6 years ago

This strace is useless for the reason you already explained. This is not what I asked you to do.

FichteFoll commented 6 years ago

I tried attaching to the systemd --user process (pid 515) through a systemd system unit, but for some reason it attached to the login process (pid 472) instead (??) and got a 200MB log file from that, which doesn't seem right. Dry testing on a running machine works as expected and attaches to 515. Would love to get a hint on how to debug this better.

/usr/lib/systemd/system/test.service

[Unit]
Description=Test Unit (for debugging etc)

[Service]
Type=simple
ExecStart=/root/test

[Install]
WantedBy=multi-user.target

/root/test

#!/usr/bin/env zsh

while true; do
    pids=(`pidof systemd`)
    echo "pids: $pids"
    for pid in $pids; do
        if ((pid > 1)) {
            echo "attaching to $pid"
            strace -p $pid -f -v -s256 2>&1 >"/tmp/systemd-user-strace_`date -Iminute`.txt"
            # not gonna halt anyway; need to kill the unit
            exit
        }
    done
    sleep 0.1
done
MaxKellermann commented 6 years ago

Maybe you should first report a bug to the strace project, if you believe it attached to a different process than you specified? When you have your tools sorted out and fixed, post your strace. Until then, this issue is closed.

Jonathanyyf commented 6 years ago

I have exactly the same issue on Armv7 (OrangePi) Debian 9. I have compiled the 0.20.18 release. It seems responsive and I can connect to it during activation, but dies after timeout.

FichteFoll commented 6 years ago

Unfortunately, the steps to reproduce the problem are quite involved (requires reboot) and the time window to strace the systemd user process is quite critical, so I'm still having issues targetting the right process. strace did not attach to a different process, it's just that I have problems determining the correct one during startup.

I'm iterating on it every day and will report back once I have a usable strace log.

FichteFoll commented 6 years ago

systemd-user-strace_2018-04-15T13:19+02:00.txt

Using the following script (and the systemd unit mentioned earlier):

#!/usr/bin/env zsh

while true; do
    pids=(`ps -u fichte | grep systemd | awk '{ print $1 }'`)
    echo "pids: $pids"
    for pid in $pids; do
        if ((pid > 1)) {
            echo "attaching to $pid"
            fname="/tmp/systemd-user-strace_`date -Iminute`.txt"
            strace -p $pid -f -v -s256 2>&1 >$fname
            # not gonna halt anyway; need to kill the unit
            exit
        }
    done
    sleep 0.1
done

For some reason, the file wasn't written (size 0), so I dumped the output of journalctl -u test.service -b0 instead.

Edit: Better use -o "/tmp/systemd-user-strace_`date -Iminute`.txt" instead of manual piping, optionally prefixed with |tee.

FichteFoll commented 6 years ago

@MaxKellermann have you had a change to look at this? Is it useful?

MaxKellermann commented 6 years ago

Do you see MPD anywhere in this trace? Because I don't.

Jonathanyyf commented 6 years ago

~~I have found the issue: --no-daemon is not working properly. Delete the --no-daemon option and add~~

[Service]
RemainAfterExit=yes

Actually the culprit is type=notify I have changed it to simple or idle and it works. (with --no-daemon) Can you try it on your Arch Linux ?

FichteFoll commented 6 years ago

I think type=notify is intended and the fact it's not working is the bug.

With my latest manual test (through disabling the unit and then starting it manually) I could verify that systemctl --user start mpd does not exit for 30s and that mpd scans what seems like my entire library before opening/sending to /run/user/1000/systemd/notify, after which the unit is considered active and systemctl exits. Unfortunately I failed again to dump the strace to disk, so I will have to try again after the next reboot to see whether mpd attempts to send any notification to systemd before scanning my library.

FichteFoll commented 6 years ago

@MaxKellermann

Do you see MPD anywhere in this trace? Because I don't.

I do. Mpd's pid in that strace log is 1387. Besides the fact it stats music files, you can also tell from:

Apr 15 13:19:23 pinaceae test[489]: [pid  1387] prctl(PR_SET_NAME, "(mpd)" <unfinished ...>

You don't see mpd notifying systemd in that log though, or at least I couldn't see it at the position when it happened in my manual run, i.e. after it stat'ed my library. Presumably because it has been killed? In the manual run, it looked as follows:

[pid 28109] sendmsg(19, {msg_name={sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/notify"}, msg_namelen=31, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
MaxKellermann commented 6 years ago

Can you show me this prctl call in the MPD source code?

FichteFoll commented 6 years ago

Looking at a "clean" strace, I determined that mpd sends to /run/user/1000/systemd/notify through the single sd_notify call in Main.cxx:646:

https://github.com/MusicPlayerDaemon/MPD/blob/5b0e8c6de0b73d8e8812cbf4588862fed2a2af42/src/Main.cxx#L645-L647

The accompanying trace is, as before:

4686  sendmsg(19, {msg_name={sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/notify"}, msg_namelen=31, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7

The only conclusion I can make with this is that mpd notifies systemd that it launched successfully after it has scanned my library, which is seemingly initiated at Main.cxx:618:

https://github.com/MusicPlayerDaemon/MPD/blob/5b0e8c6de0b73d8e8812cbf4588862fed2a2af42/src/Main.cxx#L613-L628

I'm not sure if this is supposed to run asynchronously, but it certainly doesn't. Either way, it takes 30-40s for mpd to launch and notify systemd of its successful launch when I start it manually. When the unit is started on login, this appears to take much longer (>90s), after which systemd assumes startup failed and kills the process. The only solution I see to this is not to update the database before signaling "READY=1".

If you need the "clean" strace to check this, I can upload it, but I personally don't see a reason to because it doesn't really tell anything other than what the earlier one did (except that it's cleaner).

FichteFoll commented 6 years ago

Just a small note, I'm now running with this unit override similar to @lalalolo49:

/home/fichte/.config/systemd/user/mpd.service.d/override.conf

# Work around issue with mpd not notifying systemd fast enough
# so it times out before the database has been scanned.
# https://github.com/MusicPlayerDaemon/MPD/issues/259
[Service]
Type=simple
MaxKellermann commented 6 years ago

My question was left unanswered for a week, and I stated often enough that I need a strace to analyze the issue, and that your previous strace is wrong. Since you found a workaround for your own personal problem, I guess there is no interest to pursue this anymore, so I'm locking this issue.