MusicPlayerDaemon / MPD

Music Player Daemon
https://www.musicpd.org/
GNU General Public License v2.0
2.16k stars 346 forks source link

Removing soundcard results in 100% CPU load -> bug in event dispatcher of alsa plugin #1139

Closed MHoyer12345678 closed 3 years ago

MHoyer12345678 commented 3 years ago

Bug report

Describe the bug

Expected Behavior

Actual Behavior

Version

Music Player Daemon 0.21.5 (0.21.5)

Copyright 2003-2007 Warren Dukes warren.dukes@gmail.com Copyright 2008-2018 Max Kellermann max.kellermann@gmail.com This is free software; see the source for copying conditions. There is NO warranty; not even MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Database plugins: simple proxy upnp

Storage plugins: local smbclient udisks nfs curl

Neighbor plugins: smbclient upnp udisks

Decoders plugins: [mad] mp3 mp2 [mpg123] mp3 [vorbis] ogg oga [oggflac] ogg oga [flac] flac [opus] opus ogg oga [sndfile] wav aiff aif au snd paf iff svx sf voc w64 pvf xi htk caf sd2 [audiofile] wav au aiff aif [dsdiff] dff [dsf] dsf [hybrid_dsd] m4a [faad] aac [mpcdec] mpc [wavpack] wv [modplug] 669 amf ams dbm dfm dsm far it med mdl mod mtm mt2 okt s3m stm ult umx xm [mikmod] amf dsm far gdm imf it med mod mtm s3m stm stx ult uni xm [sidplay] sid mus str prg P00 [wildmidi] mid [fluidsynth] mid [adplug] amd d00 hsc laa rad raw sa2 [ffmpeg] 16sv 3g2 3gp 4xm 8svx aa3 aac ac3 adx afc aif aifc aiff al alaw amr anim apc ape asf atrac au aud avi avm2 avs bap bfi c93 cak cin cmv cpk daud dct divx dts dv dvd dxa eac3 film flac flc fli fll flx flv g726 gsm gxf iss m1v m2v m2t m2ts m4a m4b m4v mad mj2 mjpeg mjpg mka mkv mlp mm mmf mov mp+ mp1 mp2 mp3 mp4 mpc mpeg mpg mpga mpp mpu mve mvi mxf nc nsv nut nuv oga ogm ogv ogx oma ogg omg opus psp pva qcp qt r3d ra ram rl2 rm rmvb roq rpl rvc shn smk snd sol son spx str swf tak tgi tgq tgv thp ts tsp tta xa xvid uv uv2 vb vid vob voc vp6 vmd wav webm wma wmv wsaud wsvga wv wve [gme] ay gbs gym hes kss nsf nsfe sap spc vgm vgz [pcm]

Filters: libsamplerate soxr

Tag plugins: id3tag

Output plugins: shout null fifo pipe alsa ao oss openal pulse jack httpd recorder

Encoder plugins: null opus lame wave flac shine

Archive plugins: [bz2] bz2 [zzip] zip [iso] iso

Input plugins: file archive alsa tidal qobuz curl ffmpeg smbclient nfs mms cdio_paranoia

Playlist plugins: extm3u m3u pls xspf asx rss soundcloud flac cue embcue

Protocols: file:// alsa:// tidal:// qobuz:// http:// https:// gopher:// rtp:// rtsp:// rtmp:// rtmpt:// rtmps:// smb:// nfs:// mms:// mmsh:// mmst:// mmsu:// cdda://

Other features: avahi dbus udisks epoll icu inotify ipv6 systemd tcp un

Log

no relevant logs ...

root cause

potential solution

Following patch leads to the expected behavior. --- mpd-0.21.5.orig/src/lib/alsa/NonBlock.cxx +++ mpd-0.21.5/src/lib/alsa/NonBlock.cxx @@ -73,6 +73,10 @@ AlsaNonBlockPcm::DispatchSockets(MultiSo if (err < 0) throw FormatRuntimeError("snd_pcm_poll_descriptors_revents() failed: %s", snd_strerror(-err)); +

MaxKellermann commented 3 years ago

Your MPD version is very old and unsupported.

MHoyer12345678 commented 3 years ago

I can confirm that the issue still exists on master.

MaxKellermann commented 3 years ago

So you said handling POLLERR inside MPD, instead of passing it to libasound, solves the problem, right? I think it's libasound's responsibility to handle POLLERR.

MHoyer12345678 commented 3 years ago

Hello. Mmm I'm not entirely sure about this one. So from my point of view, the API of libasound is abstracting away from the user the numbers of file descriptors to poll for but not the usage of poll and the evaluation of the results such es the event mask. The function snd_pcm_poll_descriptors_revents() just extracts an revent value from all file descriptor's event masks and returns it. Calling this function finally throwing away the result makes no sense at all. Btw: The same is true in the mixer part of the plugin. Here, the function snd_mixer_poll_descriptors_revents is used neglecting the result as well. In contrast to the PCM, the mixer part does not lead to a infinite loop because of an actual dispatcher function called on events which returns an error if called in case of POLLERR. Such a dispatcher is not needed / provided for the PCM case. So yeah, the API from libasound could be a bit more user friendly at this point. But as it is currently, I guess it is the users duty to read out POLLERR. My two cts ...

MaxKellermann commented 3 years ago

Calling this function finally throwing away the result makes no sense at all.

But what does make sense? What does this revents parameter to snd_pcm_poll_descriptors_revents() mean? What is an application supposed to do with it? It's undocumented, unfortunately. And reading the libasound source code doesn't give any insights.

MHoyer12345678 commented 3 years ago

Yeah right, this is a bit unspecific taking the API description into account. I guess what is described here https://man7.org/linux/man-pages/man2/poll.2.html gives a good hint of what you can expect. In case of the mixer implementation, the _revent() function returns an OR combination of the flags of the different file descriptor whatever they are for. In PCM case, most probably the hwdep.c implementation is used which directly returns the revent from poll of the only one file descriptor it uses .... Not really satisfying. I'd assume checking for POLLERR and taking it as error in case the bit is set is a good option to start with. By the way: There is a test file available for pcm (https://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2pcm_8c-example.html#a31). This one shows you how alsa assumes to use these functions. Check out function: static int wait_for_poll(snd_pcm_t handle, struct pollfd ufds, unsigned int count);

Hope this helps ...

MaxKellermann commented 3 years ago

By the way: There is a test file available for pcm (https://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2pcm_8c-example.html#a31). This one shows you how alsa assumes to use these functions.

Sigh. This example code ignores the return value of snd_pcm_poll_descriptors_revents() (documented!) and instead checks revents (not documented!). This is bad.

(And it ignores the return value of poll(), leading to an endless busy loop on error. WTF.)

MHoyer12345678 commented 3 years ago

Mmm not sure. Check the man pages of poll. There are only five errnos which could be returned by poll. Three are returned in case of issues with passed parameters. One is OoM. The last one is returned when a signal is sent to the thread (interrupts poll).

Parameter issues are expected to not happen in such a simple example. OoM (ENOMEM) is a special scenario which is hard to recover from cleanly anyway. Ignoring it is ok, the process will finally get killed anyway at a certain point. The interrupt (EINTR) leads to one walk through the main loop ending up in a blocked poll again. So not evaluating any of the negative error codes seems ok in this situation. A return value of 0 shall not happen since -1 is passed as timeout parameter. So only positive numbers remain which indicate state changes of the file descriptors. The state changes themselves are returned with revent and the respective POLLXXX flags set. The only one from any interest here is POLLERR, which is actually evaluated.

I guess there is a subtile differentiation one has to keep in mind. The work of poll is just to block until a state change of a stream belonging to the file descriptor passed is detected. It is not responsible for evaluating the state change itself and react respectively. So from this point of view, the 5 errors discussed above are the only execeptions that can happen regarding the work and responsibilities of poll. The actual state change of the stream is indicated by the state result mask (revent). It is up to the user of poll and the stream how to interpret and react on a certain state change. So I guess the respective functions from libasound are following the same mindset even though a bit more documentation could have helped here of course.

Regards, Marko Hoyer

Btw: libasound and many other OSS are good example for: "OSS can be used without any cost for licensing but using OSS is (from a TCO perspective) not free of charge ;)" Assuming this (as commercial company) could easily lead to unexpected costs not assumed before.

MaxKellermann commented 3 years ago

I tried to reproduce this problem with my USB DAC, but couldn't. I see that epoll_wait() returns EPOLLERR for the ALSA FD, but then:

epoll_wait(8, [{EPOLLERR, {u32=3221228600, u64=140298327952440}}], 16, -1) = 1
ioctl(23, SNDRV_PCM_IOCTL_HWSYNC, 0) = -1 ENODEV (No such device)

... and MPD stops playback gracefully with a proper error message:

exception: Failed to play on "alsa" (alsa): snd_pcm_writei() failed: No such device

Obviously, this is better than this ALSA example program which translates POLLERR to -EIO. Everything is wrong with this example program.

The work of poll is just to block until a state change of a stream belonging to the file descriptor passed is detected. It is not responsible for evaluating the state change itself and react respectively.

This is true, and this is exactly why MPD shouldn't handle POLLERR, but instead pass it to libasound, and get -ENODEV back.

Requiring the application to handle POLLERR doesn't make sense, and isn't documented that way.

So, what are the circumstances with your MPD that lead to a busy loop? I figured that you get EPOLLERR, and then MPD calls snd_pcm_writei(), which should then fail (and it does on my computer). Why doesn't it for you? Are you using a hw: device or are you using libasound plugins? (the latter is libasound's default, unfortunately)

Right now, I'm not convinced that this is a MPD problem at all, I rather think this is an ALSA (libasound) bug.

MHoyer12345678 commented 3 years ago

So, what are the circumstances with your MPD that lead to a busy loop? I figured that you get EPOLLERR, and then MPD calls snd_pcm_writei(), which should then fail (and it does on my computer). Why doesn't it for you? Are you using a hw: device or are you using libasound plugins? (the latter is libasound's default, unfortunately)

Ok right, good to know. Seem snd_pcm_writei() should detect the missing card and return an error code and is doing so in your case. In my case, a dmix plugin is in the game. I implemented a radio with three different sources: inet radio via mpd, dlna client, lms client. These sources are mixed together with dmix with which I was able to implement mute ramps and so on ...

So I guess the actual bug is in the dmix implementation. I'll try to understand what's happening there and maybe post a patch there as well.

Anyway, evaluating POLLERR would make mpd a bit more robust. I guess doing so is the actual reason for the _revents function. There is no real benefit of using them without exactly doing exactly that.

Regards, Marko

MaxKellermann commented 3 years ago

Anyway, evaluating POLLERR would make mpd a bit more robust. I guess doing so is the actual reason for the _revents function. There is no real benefit of using them without exactly doing exactly that.

POLLERR doesn't give any information about which error occurred. POLLERR just indicates that an error occurred on a certain file descriptor, and it's up to userspace to obtain more details about the error - e.g. using getsockopt(SO_ERROR), or just by attempting to do an I/O call, like snd_pcm_writei().

Point is: POLLERR alone is useless, it's a bad interface design in libasound to even return it to the application (with the undocumented expectation that the application does something with it, whatever that may be - undocumented and badly written example code).

So yes, this sounds like a bug in the ALSA plugin you're using, and I still believe MPD doesn't need to be changed.

MHoyer12345678 commented 3 years ago

I can confirm that in error case snd_pcm_writei return -EAGAIN, the pcm state remains in RUNNING. So right, issue is in dmix plugin from libasound. I'll check if I can find something there ...

MHoyer12345678 commented 3 years ago

Can now confirm the behavior you saw with snd_pcm_writei with an alsa configuration without a dmix plugin involved. The function and the underlying ioctl return both -ENODEV. Side mark: Interestingly, the returned error code is not documented. So as per docu, the function should never return this value but maybe badfd or so ...

However. Next interesting fact: xxx_writei used with the dmix configuration returns either -EIO or -EAGAIN depending on if SND_PCM_NON_BLOCKING flag is set when opening pcm or not ... Remains interesting. Guess I'll deep dive a little into the dmix plugin now ...

MaxKellermann commented 3 years ago

Can now confirm the behavior you saw with snd_pcm_writei with an alsa configuration without a dmix plugin involved. The function and the underlying ioctl return both -ENODEV. Side mark: Interestingly, the returned error code is not documented. So as per docu, the function should never return this value but maybe badfd or so ...

That -ENODEV is from the kernel. libasound just passes it through to the caller, therefore it's not explicitly documented. The documented codes are those generated by libasound itself, I think.

-EBADF would be wrong, because the file descriptor is valid (though you can't do much with it other than close()), only the underlying hardware is gone.

MHoyer12345678 commented 3 years ago