dougreese / squeezelite

Automatically exported from code.google.com/p/squeezelite
Other
0 stars 0 forks source link

Playing mp3 to usb dac suddenly went into 100% cpu loop #15

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Playing mp3 vbr song, nothing special.
2. /opt/squeezelite/squeezelite-i386 -z -o "hw:CARD=D1,DEV=0" 127.0.0.1
3.

What is the expected output? What do you see instead?
Music playback suddenly stopped mid-song
htop revealed squeezelite was looping

What version of the product are you using? On what operating system?
squeezelite-i386 v1.0 also saw another time on rc2
Linux Mint Maya 3.2.0-25-generic on Aspire One 1GB ram

Please provide any additional information below.

Original issue reported on code.google.com by danielbo...@gmail.com on 19 Feb 2013 at 9:10

GoogleCodeExporter commented 9 years ago
Which version of mp3 codec?  Can you reproduce?  Is the file local and is the 
LocalPlayer plugin installed?

Original comment by trio...@btinternet.com on 19 Feb 2013 at 9:29

GoogleCodeExporter commented 9 years ago
It's looping currently... Can't reproduce at will but seems to happen more 
frequently than with rc2 (luck?)
The mp3 codec is libmpg123-0 version 1.12.1-3.2ubuntu1 (latest)
file is local (on usb disk ntfs formatted)
I don't know what the LocalPlayer plugin is so I'm going to say no :)
I'm using Logitech Media Server Version: 7.7.2 - r33893

Here is the output of pstack (if that helps):
5814: /opt/squeezelite/squeezelite-i386 -z -o hw:CARD=D1,DEV=0 127.0.0.1
(No symbols found in )
(No symbols found in /usr/lib/i386-linux-gnu/libasound.so.2)
(No symbols found in /lib/i386-linux-gnu/libdl.so.2)
(No symbols found in /lib/i386-linux-gnu/librt.so.1)
(No symbols found in /lib/i386-linux-gnu/libc.so.6)
(No symbols found in /lib/i386-linux-gnu/libm.so.6)
(No symbols found in /lib/ld-linux.so.2)
(No symbols found in /usr/lib/i386-linux-gnu/libpulse.so.0)
(No symbols found in /usr/lib/i386-linux-gnu/libjson.so.0)
(No symbols found in /usr/lib/i386-linux-gnu/libpulsecommon-1.1.so)
(No symbols found in /lib/i386-linux-gnu/libdbus-1.so.3)
(No symbols found in /usr/lib/i386-linux-gnu/libxcb.so.1)
(No symbols found in /lib/i386-linux-gnu/libwrap.so.0)
(No symbols found in /usr/lib/i386-linux-gnu/libsndfile.so.1)
(No symbols found in /usr/lib/i386-linux-gnu/libasyncns.so.0)
(No symbols found in /usr/lib/i386-linux-gnu/libXau.so.6)
(No symbols found in /usr/lib/i386-linux-gnu/libXdmcp.so.6)
(No symbols found in /lib/i386-linux-gnu/libnsl.so.1)
(No symbols found in /usr/lib/i386-linux-gnu/libFLAC.so.8)
(No symbols found in /usr/lib/i386-linux-gnu/libvorbisenc.so.2)
(No symbols found in /usr/lib/i386-linux-gnu/libvorbis.so.0)
(No symbols found in /usr/lib/i386-linux-gnu/libogg.so.0)
(No symbols found in /lib/i386-linux-gnu/libresolv.so.2)
(No symbols found in /usr/lib/libfaad.so.2)
(No symbols found in /usr/lib/i386-linux-gnu/libvorbisfile.so.3)
(No symbols found in /usr/lib/i386-linux-gnu/libmad.so.0)
0x0087d416: _fini + 0x266936

Tried a "kill -5 <pid>" to generate a code but it doesn't seem like it worked...
Let me know if you want me to run with debugs

Original comment by danielbo...@gmail.com on 19 Feb 2013 at 10:37

GoogleCodeExporter commented 9 years ago
...generate a coRe
Next time I will try gcore

Original comment by danielbo...@gmail.com on 19 Feb 2013 at 10:42

GoogleCodeExporter commented 9 years ago
If you can reproduce, please try with -d all=sdebug (will produce lots of text, 
but should show where the error is)

Original comment by trio...@btinternet.com on 19 Feb 2013 at 10:48

GoogleCodeExporter commented 9 years ago
Ok, running with debugs on until it loops again and will try to generate a core 
next time.

Original comment by danielbo...@gmail.com on 19 Feb 2013 at 11:01

GoogleCodeExporter commented 9 years ago
Let it play all day yesterday with debugs on. No lock-up.
Trying today with no debugs to see if I can reproduce.
When it locked up, I might have been playing with the -p option. BTW, it 
doesn't seem to change anything. No matter what value I specify, all 4 threads 
seem to always be at 20 (according to htop).

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 3:28

GoogleCodeExporter commented 9 years ago
-p option sets the real time priorty which will show up if the user you run it 
as is allowed to use the real time scheduler class.  This requires explicit 
permissions configuration or for you to run it as root.  You can see whether 
the -p option worked by starting with -d output=debug

Original comment by trio...@btinternet.com on 21 Feb 2013 at 6:23

GoogleCodeExporter commented 9 years ago
Running with debugs did not produce any lock ups but running without did.
And unless I'm mistaking, the executable was on "pause" when it happened, this 
time.

Plus, two threads were spinning: the first and third (by increasing order of 
pid).
I managed to run strace and gcore on the two pids. Would you like me to upload 
the cores anywhere?

The first strace generated this:
futex(0x8060618, FUTEX_LOCK_PI_PRIVATE, 1 <unfinished ...>

The second one, generated nothing.

I will re-enable the debugs now and try harder to get a lock up with them on.

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 7:50

GoogleCodeExporter commented 9 years ago
Were you able to tell which thread was spinning at high rate?  Probably the one 
with no system calls?  I would expect 4 threads to all be active at low rate 
normally.  Are you saying the other two were stopped?  if so which system call 
are they stopped in - is this a futex call?  I wonder if the decode library is 
not returning and hence holding a mutex permanently.

Please upload the core anyway - can't promise to look until the weekend.

Original comment by trio...@btinternet.com on 21 Feb 2013 at 7:58

GoogleCodeExporter commented 9 years ago
Out of the 4 threads, said pids 2276-2279. Only pids 2276 and 2278 were at 100% 
each (dual core). The other two appeared "normal", say 0-10% cpu.

As mentioned in previous message, only thread 2276 had anything coming out of 
strace:
futex(0x8060618, FUTEX_LOCK_PI_PRIVATE, 1 <unfinished ...>

The other thread (2278) returned a 0 byte file...
I will gzip the two cores and try to attach here. If it fails due to size, I'll 
find a suitable place to upload them and supply the link.

No pressure, take a look when you can. I'm very happy to have squeezelite as it 
stands. If I can help make it better, then good for us both (and all the others 
too!)

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 9:20

Attachments:

GoogleCodeExporter commented 9 years ago
Here is the second core (the one with the empty strace output).

Thanks,
Daniel

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 9:26

Attachments:

GoogleCodeExporter commented 9 years ago
It just started looping with debugs on...

The file is 105MB in size but probably only the last two lines are relevant...

Here are the last lines of the file:
[17:06:50.364538] output_thread:1199 wrote 440 frames
[17:06:50.374368] output_thread:742 avail: 484 frames: 484 silence: 1
[17:06:50.374624] output_thread:1199 wrote 484 frames
[17:06:50.386075] output_thread:742 avail: 440 frames: 440 silence: 1
[17:06:50.387076] output_thread:1199 wrote 440 frames
[17:06:50.395327] output_thread:742 avail: 440 frames: 440 silence: 1
[17:06:50.395491] output_thread:1199 wrote 440 frames
[17:06:50.405468] output_thread:742 avail: 440 frames: 440 silence: 1
[17:06:50.405902] output_thread:1199 wrote 440 frames
[17:06:50.430343] output_thread:742 avail: -32 frames: -32 silence: 1
[17:12:31.325375] slimproto_stop:745 slimproto stop

Yes, I just stopped the playback and it started looping right away.

Let me know if you want more of the debug file.

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 10:21

GoogleCodeExporter commented 9 years ago
That actually looks ok - its writing silence to the output and there are small 
gaps between the timestamps which suggests its not running at 100%.  Can you 
look back earlier in the file to see any differnces?

Original comment by trio...@btinternet.com on 21 Feb 2013 at 10:24

GoogleCodeExporter commented 9 years ago
Ok, obviously I don't know what's important and what's not so I might as well 
upload the whole mess to you :)

I hope you have a good storage limit for attached files... I split the file in 
two parts and gzipped them. I will be able to upload each part separately (10MB 
limit per post). so here goes...

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 10:39

Attachments:

GoogleCodeExporter commented 9 years ago
...and the second one...

Original comment by danielbo...@gmail.com on 21 Feb 2013 at 10:44

Attachments:

GoogleCodeExporter commented 9 years ago
Just some more data for you:

I've been listening to music all day with debugs turned off (figured I gave you 
enough for now!).
Everything was fine until it started playing a new song (about 5 seconds into 
it) when squeezelite froze again (2 threads at 100% and 2 at 0%):

2467: 100% cpu
strace: 1 line: futex(0x8060618, FUTEX_LOCK_PI_PRIVATE, 1{waiting}

2468: 0% cpu
strace:
Process 2468 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
nanosleep({0, 100000000}, NULL)         = 0
{last line repeated continuously}

2469: 100% cpu
strace: no output

2470: 0% cpu
strace: 1 line: futex(0x8060618, FUTEX_LOCK_PI_PRIVATE, 1{waiting}

When I killed squeezelite, there was a loud CLICK in the headphones.

Thanks,
Daniel

Original comment by danielbo...@gmail.com on 25 Feb 2013 at 8:39

GoogleCodeExporter commented 9 years ago
More data points (let me know if you want me to stop :)
Been listening to music all day, no lockups.
Just now, opening a terminal window and entering the command "man pidof" would 
make squeezelite lock up in the usual manner. Got it to do it several times in 
a row (5-6).
Going in Firefox and just clicking between a few tabs, the sound would 
sometimes have a bit of a split-second pause and after a few tries, I got it to 
lock up there too.
I swear it's the truth... :-)

Original comment by danielbo...@gmail.com on 26 Feb 2013 at 9:10

GoogleCodeExporter commented 9 years ago
Still can't see what it wrong - something is running in a hard loop and 
producing no debug output.  Are all the files mp3s?  Could you try using 
libmpg123 rather than libmad?

Original comment by trio...@btinternet.com on 26 Feb 2013 at 9:40

GoogleCodeExporter commented 9 years ago
Most of my files are in fact mp3s but I also have some flac files but I've 
never had a lockup on a flac file (could just be the laws of statistics 
though). I have both libmpg123 and libmad loaded. I will have to see how I can 
use one instead of the other. I am now trying to increase the priority of the 
playback thread so see if it could be related.

If you want me to try anything different (a version with increased debugs, a 
special test version, etc.) please let me know. I will also try without the usb 
dac to see if this changes anything. (I will figure out how to use libmpg123 
instead of libmad first).

Original comment by danielbo...@gmail.com on 27 Feb 2013 at 1:20

GoogleCodeExporter commented 9 years ago
Switched to libmpg123 and noticed right away that %cpu was at 0-3% on all 4 
threads instead to 2-8% with libmad.

Unfortunately, I got another lockup using libmpg123. Started noticing that the 
lockups usually happen in the afternoon, after the computer's been on for 5+ 
hours. Looked at cron jobs, looked at logs and found nothing except for this in 
syslog:
Feb 27 13:19:43 daniel-laptop pulseaudio[1828]: [alsa-sink] alsa-sink.c: Error 
opening PCM device front:1: Device or resource busy
Feb 27 13:19:43 daniel-laptop pulseaudio[1828]: [pulseaudio] sink-input.c: 
Failed to create sink input: sink is suspended.

I got these debugs today at 11:13, at 13:19 and at 13:51 (over and over for 
about 2 seconds) and the lockup happened around 13:45 or so. Will try to 
increase the output thread's priority as a next step. Unless you have better 
suggestions.

Original comment by danielbo...@gmail.com on 27 Feb 2013 at 6:55

GoogleCodeExporter commented 9 years ago
I didn't realise you are using pulse audio - is there any option you have to 
try without it.  Others have had some problems with pulse - would be good to 
understand if this is causing the problem in some way.

Original comment by trio...@btinternet.com on 27 Feb 2013 at 7:01

GoogleCodeExporter commented 9 years ago
Well, this is a standard install of Linux Mint13 so it comes with alsa and 
pulse installed but I thought using -o "hw:CARD=D1,DEV=0" would circumvent most 
of it. My normal system sounds are still coming out the laptop's speakers and 
only the output of squeezelite is coming through the USB DAC. Perhaps these 
error messages are related to the normal output? Doesn't seem to be at all 
related to any squeezelite lockup.

Original comment by danielbo...@gmail.com on 27 Feb 2013 at 8:47

GoogleCodeExporter commented 9 years ago
Afraid I still can't correlate your data.  The log files shows logging from all 
threads at reasonable intervals.  However you say two threads are running at 
100% - if so with logging on doesn't strace produce something for those threads?

Original comment by trio...@btinternet.com on 27 Feb 2013 at 9:53

GoogleCodeExporter commented 9 years ago
I think that all logging simply grinds to a halt along with the music playback, 
once the threads go into 100%/0% CPU. strace always shows what I've mentioned 
in post #16.

I could try with increased priority and/or through the default sound output 
(without the USB DAC) to try and isolate the problem?

I see in a slimdevices.com thread dated nov 2012 that some people (Mnyb) had 
problems somewhat similar but not exactly with threads locking up (He saw 
FUTEX_LOCK and FUTEX_UNLOCK whereas I only see the LOCK and it seems to hang 
there).

I'll keep on using squeezelite and try to isolate cases where it doesn't lock 
up and perhaps this will help you see what's going on...

Probably means nothing but when it's looping, a single "kill -TERM xx" will not 
do it, I have to do it twice to get it to die.

If you think of anything for me to try, please let me know.

Thanks,
Daniel

Original comment by danielbo...@gmail.com on 27 Feb 2013 at 10:11

GoogleCodeExporter commented 9 years ago
FUTEX_LOCK should result in 0% cpu - it means that the calling thread as tried 
to set a mutex, but it is already held by someone else.  That's normal if 
something else has the lock, what's not normal is this causing the thread to be 
at 100% cpu - it should be 0....

Re post #16 can you post what the repeating strace entry is for the nanosleep 
entry - is this once every 100ms or something different?

Original comment by trio...@btinternet.com on 27 Feb 2013 at 10:30

GoogleCodeExporter commented 9 years ago
Makes sense... I'm a programmer by trade (25 years) but I have little knowledge 
(read none) of multi-thread programming...

It's hard to time the output of strace for the nanosleeps but I'd say that once 
every 100ms sounds about right...

At one point, if you prefer we take this off-line, let me know and we can 
communicate directly by email. You can use born_daniel at yahoo dot com if 
you'd like.

Original comment by danielbo...@gmail.com on 27 Feb 2013 at 10:40

GoogleCodeExporter commented 9 years ago
I'm currently running squeezelite with output to default. It's so different 
than when outputting to the USB DAC... First of all, 5 threads instead of 4 and 
a cpu usage much higher (13+7+6+0+0=26% cpu) either paused or playing music!?

Just curious, what is your usual mode of utilization? what would be most 
people's? (if you know). Am I using a rare configuration?

Maybe I'll try squeezelite on a different computer with the USB DAC...
If you feel like sharing some of your thoughts with me, I'd like to help 
perhaps more than just supplying debugs. I'm used to debugging real-time, 
multi-process complex systems (just not multi-threaded). 

Original comment by danielbo...@gmail.com on 28 Feb 2013 at 1:37

GoogleCodeExporter commented 9 years ago
It's been over a week now of continuous use without a single lockup, whereas it 
used to happen at least once a day previously.
I can only attribute it to a kernel upgrade (3.2.0-25 to 3.2.0-38).

Anyway, I think it is safe to simply close this issue now.

Thanks,
Daniel

Original comment by danielbo...@gmail.com on 13 Mar 2013 at 12:27

GoogleCodeExporter commented 9 years ago
Closing as discussed - appears to be specific issue with problem linux kernel

Original comment by trio...@btinternet.com on 13 Mar 2013 at 8:36