Closed FichteFoll closed 6 years ago
This is an Arch bug. They build MPD without libsystemd, and thus MPD is unable to send the expected notify to systemd.
See https://git.archlinux.org/svntogit/packages.git/plain/trunk/PKGBUILD?h=packages/mpd (No libsystemd dependency)
The PKGBUILD was fixed rather quickly and I updated to that version, but the problem still occurs.
What, mpv --version
?? Does your systemd unit start mpv
?
Are you really sure this is the binary started by your systemd unit? Check with strace whether MPD sends READY=1
to systemd.
Eh, that's obviously meant to be mpd
. Just muscle memory interfering.
I assume I should be stracing off a fresh boot/login?
strace your user systemd (with -f -v -s256
) while you reproduce the problem.
Ran strace
manually in a terminal after determining the process's pid, so the first few seconds are missing.
journalctl --user
log during the time (nothing surprising):
-- Logs begin at Thu 2017-01-05 20:38:55 CET. --
Apr 11 11:07:02 pinaceae systemd[510]: Starting Music Player Daemon...
Apr 11 11:07:03 pinaceae mpd[518]: exception: bind to '0.0.0.0:6600' failed (continuing anyway, because binding to '[::]:6600' succeeded): Failed to bind socket: Address already in use
Apr 11 11:07:03 pinaceae mpd[518]: libsamplerate: libsamplerate converter 'Fastest Sinc Interpolator'
Apr 11 11:07:03 pinaceae mpd[518]: vorbis: Xiph.Org libVorbis 1.3.6
Apr 11 11:07:03 pinaceae mpd[518]: opus: libopus 1.2.1
Apr 11 11:07:03 pinaceae mpd[518]: sndfile: libsndfile-1.0.28
Apr 11 11:07:03 pinaceae mpd[518]: simple_db: reading DB
Apr 11 11:07:03 pinaceae mpd[518]: curl: version 7.59.0
Apr 11 11:07:03 pinaceae mpd[518]: curl: with OpenSSL/1.1.0h
Apr 11 11:08:32 pinaceae systemd[510]: mpd.service: Start operation timed out. Terminating.
Apr 11 11:08:37 pinaceae systemd[510]: mpd.service: Failed with result 'timeout'.
Apr 11 11:08:37 pinaceae systemd[510]: Failed to start Music Player Daemon.
This strace is useless for the reason you already explained. This is not what I asked you to do.
I tried attaching to the systemd --user
process (pid 515) through a systemd system unit, but for some reason it attached to the login process (pid 472) instead (??) and got a 200MB log file from that, which doesn't seem right. Dry testing on a running machine works as expected and attaches to 515.
Would love to get a hint on how to debug this better.
/usr/lib/systemd/system/test.service
[Unit]
Description=Test Unit (for debugging etc)
[Service]
Type=simple
ExecStart=/root/test
[Install]
WantedBy=multi-user.target
/root/test
#!/usr/bin/env zsh
while true; do
pids=(`pidof systemd`)
echo "pids: $pids"
for pid in $pids; do
if ((pid > 1)) {
echo "attaching to $pid"
strace -p $pid -f -v -s256 2>&1 >"/tmp/systemd-user-strace_`date -Iminute`.txt"
# not gonna halt anyway; need to kill the unit
exit
}
done
sleep 0.1
done
Maybe you should first report a bug to the strace project, if you believe it attached to a different process than you specified? When you have your tools sorted out and fixed, post your strace. Until then, this issue is closed.
I have exactly the same issue on Armv7 (OrangePi) Debian 9. I have compiled the 0.20.18 release. It seems responsive and I can connect to it during activation, but dies after timeout.
Unfortunately, the steps to reproduce the problem are quite involved (requires reboot) and the time window to strace the systemd user process is quite critical, so I'm still having issues targetting the right process. strace did not attach to a different process, it's just that I have problems determining the correct one during startup.
I'm iterating on it every day and will report back once I have a usable strace log.
systemd-user-strace_2018-04-15T13:19+02:00.txt
Using the following script (and the systemd unit mentioned earlier):
#!/usr/bin/env zsh
while true; do
pids=(`ps -u fichte | grep systemd | awk '{ print $1 }'`)
echo "pids: $pids"
for pid in $pids; do
if ((pid > 1)) {
echo "attaching to $pid"
fname="/tmp/systemd-user-strace_`date -Iminute`.txt"
strace -p $pid -f -v -s256 2>&1 >$fname
# not gonna halt anyway; need to kill the unit
exit
}
done
sleep 0.1
done
For some reason, the file wasn't written (size 0), so I dumped the output of journalctl -u test.service -b0
instead.
Edit: Better use -o "/tmp/systemd-user-strace_`date -Iminute`.txt"
instead of manual piping, optionally prefixed with |tee
.
@MaxKellermann have you had a change to look at this? Is it useful?
Do you see MPD anywhere in this trace? Because I don't.
~~I have found the issue: --no-daemon
is not working properly. Delete the --no-daemon
option
and add~~
[Service]
RemainAfterExit=yes
Actually the culprit is type=notify
I have changed it to simple
or idle
and it works. (with --no-daemon
)
Can you try it on your Arch Linux ?
I think type=notify
is intended and the fact it's not working is the bug.
With my latest manual test (through disabling the unit and then starting it manually) I could verify that systemctl --user start mpd
does not exit for 30s and that mpd scans what seems like my entire library before opening/sending to /run/user/1000/systemd/notify
, after which the unit is considered active and systemctl exits. Unfortunately I failed again to dump the strace to disk, so I will have to try again after the next reboot to see whether mpd attempts to send any notification to systemd before scanning my library.
@MaxKellermann
Do you see MPD anywhere in this trace? Because I don't.
I do. Mpd's pid in that strace log is 1387. Besides the fact it stats music files, you can also tell from:
Apr 15 13:19:23 pinaceae test[489]: [pid 1387] prctl(PR_SET_NAME, "(mpd)" <unfinished ...>
You don't see mpd notifying systemd in that log though, or at least I couldn't see it at the position when it happened in my manual run, i.e. after it stat'ed my library. Presumably because it has been killed? In the manual run, it looked as follows:
[pid 28109] sendmsg(19, {msg_name={sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/notify"}, msg_namelen=31, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
Can you show me this prctl
call in the MPD source code?
Looking at a "clean" strace, I determined that mpd sends to /run/user/1000/systemd/notify
through the single sd_notify
call in Main.cxx:646:
The accompanying trace is, as before:
4686 sendmsg(19, {msg_name={sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/notify"}, msg_namelen=31, msg_iov=[{iov_base="READY=1", iov_len=7}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 7
The only conclusion I can make with this is that mpd notifies systemd that it launched successfully after it has scanned my library, which is seemingly initiated at Main.cxx:618:
I'm not sure if this is supposed to run asynchronously, but it certainly doesn't. Either way, it takes 30-40s for mpd to launch and notify systemd of its successful launch when I start it manually. When the unit is started on login, this appears to take much longer (>90s), after which systemd assumes startup failed and kills the process. The only solution I see to this is not to update the database before signaling "READY=1".
If you need the "clean" strace to check this, I can upload it, but I personally don't see a reason to because it doesn't really tell anything other than what the earlier one did (except that it's cleaner).
Just a small note, I'm now running with this unit override similar to @lalalolo49:
/home/fichte/.config/systemd/user/mpd.service.d/override.conf
# Work around issue with mpd not notifying systemd fast enough
# so it times out before the database has been scanned.
# https://github.com/MusicPlayerDaemon/MPD/issues/259
[Service]
Type=simple
My question was left unanswered for a week, and I stated often enough that I need a strace to analyze the issue, and that your previous strace is wrong. Since you found a workaround for your own personal problem, I guess there is no interest to pursue this anymore, so I'm locking this issue.
I'm running Arch Linux (and mpd 0.20.18) and since approximately two weeks ago, the mpd systemd user unit consistently times out when I boot.
An strace call during this initial startup revealed that mpd is still doing work at this time and crawling my music library (indicated by lots of
stat
calls), after which it terminates. I suspect that startup takes too long (>30s) for systemd so it considers the unit to have failed and signals it to terminate, which mpd then does but only after it finished crawling the library.Any following manually invocation of
systemctl --user start mpd.service
succeeds andstrace
showsepoll_pwait(6,
most of the time, until I start playback.My unit is as follows: