alip / mpdcron

A ‘cron’ like program for mpd. It can execute scripts based on mpd’s idle events.
http://mpd.wikia.com/wiki/Client:MPDCRON
GNU General Public License v2.0
35 stars 8 forks source link

When set to act on the 'playlist' event, hangs (using 100% CPU) on large playlists #3

Closed sdelafond closed 14 years ago

sdelafond commented 14 years ago

This is Debian bug #573805 (http://bugs.debian.org/573805).

More info available at the URL above, but basically:

On Sun, Mar 14, 2010 at 06:59:14PM +0100, Sebastien Delafond wrote:

[stats] Resolved `localhost' to 127.0.0.1 [stats] Successful bind to 127.0.0.1:6601 [1] Sending list_queue_meta command to Mpd server [2] === it goes into an infinite loop here === ^C[stats] Exiting

Sorry for having overlooked this detail. [1] is the point where I issue mpc del 0. As it seems hard to reproduce I ran it again with strace, to give you some pointers of what happens.

[pid 26309] poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}], 3, -1) = 1 ([{fd=4, revents=POLLIN}]) [pid 26309] select(5, [4], [], [4], {30, 0}) = 1 (in [4], left {29, 999993}) [pid 26309] recvfrom(4, "changed: playlist\nchanged: playe"..., 4096, MSG_DONTWAIT, NULL, NULL) = 37 [pid 26309] write(2, "Sending list_queue_meta command "..., 45Sending list_queue_meta command to Mpd server) = 45 [pid 26309] write(2, "\n", 1 ) = 1 [pid 26309] sendto(4, "playlistinfo\n", 13, MSG_DONTWAIT, NULL, 0) = 13 [pid 26309] select(5, [4], [], [4], {30, 0}) = 1 (in [4], left {29, 991003}) [pid 26309] recvfrom(4, "file: "..., 4096, MSG_DONTWAIT, NULL, NULL) = 4096 [pid 26309] open("/etc/localtime", O_RDONLY) = 8 [pid 26309] fstat(8, {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] fstat(8, {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7feb77595000 [pid 26309] read(8, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0\0\0\0"..., 4096) = 837 [pid 26309] close(8) = 0 [pid 26309] munmap(0x7feb77595000, 4096) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] select(5, [4], [], [4], {30, 0}) = 1 (in [4], left {29, 999994}) [pid 26309] recvfrom(4, ""..., 4074, MSG_DONTWAIT, NULL, NULL) = 4074 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0 [pid 26309] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=837, ...}) = 0

From this up a repeating pattern starting with the last select can be seen.

Occasionally some lines like: [pid 26832] mremap(0x7f201096e000, 458752, 462848, MREMAP_MAYMOVE) = 0x7f201096e000 or [pid 26832] brk(0x2bd7000) = 0x2bd7000 come along. They show that mpdcron grows in memory size.

Since mpdcron continues to read data from mpd (very slowly), it is probably not an infinite loop, but a slow algorithm. The loop seems to be getting slower over time which suggests an O(n^2) algorithm. I should probably mention that my playlist currently has around 10000 entries. You can easily create such a playlist by concatenating a playlist to itself often enough. After eleven minutes of watching mpdcron read a playlist I gave up and killed it, because two songs already passed by.

Furthermore I verified that disabling the stats module does not solve the problem.

A quick check of the source shows that it probably blows up in env_list_all_meta, because it uses strftime (and thus doing all those stat calls) and using g_setenv in a loop and thus being an O(n^2) algorithm.

The bad thing about this is that there is no easy way to solve this problem, because the real problem didn't even show up: Let us assume 10500 playlist entries with each of them having a filename of 60 characters. For each entry two variables are added, which should take another 40 characters only counting the lengths of their names. So a lower bound for the size of the environment would be 1MB. This size may not be supported by the operating system, so the whole approach of feeding the playlist via the environment is flawed. Using g_setenv in a loop only makes things break at an earlier point.

Given that it requires a large playlist I downgraded the severity to normal. I hope this helps for solving the issue.

Cheers,

--Seb

alip commented 14 years ago

We really need a better way to pass data to the hooks. Using the environment was the easiest way so I went with that initially. But as you said this approach is flawed and it shows itself clearly with the playlist event case.

Maybe using standard input to pass data is a better way. What do you think?

sdelafond commented 14 years ago

stdin is a much better way, yes :)

alip commented 14 years ago

As of 387f423a7833653be11326a586218a86e5751e14 mpdcron is more conservative when exporting variables to the environment. I'll update the documentation on webpage soonish. I did this because I don't have time to properly solve this problem (with stdin method or some other one)... That can wait for later. Please test and report back!

alip commented 14 years ago

Documentation fixed by 0a25333912f49b01dc47e125dc356cc6fb8d9266 Closing this, reopen if the problem persists.