beetbox / audioread

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python
MIT License
483 stars 108 forks source link

Hanging in close() #9

Closed dracos closed 10 years ago

dracos commented 10 years ago

Mac OS X 10.9, ffmpeg 2.1 from MacPorts, audioread 1.0.1 from pypi.

If I have the following small file (with the directory of MP3s being the one from this repository - https://github.com/mysociety/sayit ):

import audioread.ffdec
import os

def get_audio_duration(in_filename):
    f = audioread.ffdec.FFmpegAudioFile(in_filename)
    return round(f.duration)

root = 'speeches/fixtures/expected_outputs/mp3'
for mp3 in os.listdir(root):
    mp3 = os.path.join(root, mp3)
    print mp3, get_audio_duration(mp3)

Then sometimes it will run fine, but frequently it will hang on one of the files. As far as I have been able to work out, it is hanging on the wait() inside close(), but I'm not sure what I should do in order to debug it further. Running ffmpeg manually, I can't see any problems. I can Ctrl-C, in which case I get a message: Exception KeyboardInterrupt: KeyboardInterrupt() in <bound method FFmpegAudioFile.__del__ of <audioread.ffdec.FFmpegAudioFile object at 0x1087cff90>> ignored and the file continues running without issue (printing the duration, in this case), but leaves an ffmpeg process lying about that I have to manually kill -9.

Hope that's useful, do let me know if you'd like any further information to help debug or fix this.

sampsyo commented 10 years ago

Thanks for the detailed & complete bug report. Alas, I still can't seem to reproduce this on my machine (also ffmpeg 2.1, but on a Linux box; my Mac has a dead battery today :astonished:).

I can think of two possibilities for what could be going on here:

In either case, maybe seeing ffmpeg's output will help clarify things. Can you try putting a print statement into your copy of audioread in the ReaderThread loop? Specifically, print data after line 57 (here) should do it

dracos commented 10 years ago

[ Original reply, please see below. https://gist.github.com/dracos/7334094 has two files, "works" and "hangs" - I cut streams of binary data and replaced with [Binary data], and changed to have print mp3 and print get_audio_duration(mp3) on separate lines. They look the same to me, apart from one just stops on the fourth file with no data returned. If I hit Ctrl-C, then it outputs the contents of "hangs-post-ctrl-c" - giving the output fine, but missing any ffmpeg output from the Ctrl-C file, or the last file either? ]

Ooh, if I get rid of the function and have this script:

import audioread.ffdec
import os
import sys

root = 'speeches/fixtures/expected_outputs/mp3'
for mp3 in os.listdir(root):
    mp3 = os.path.join(root, mp3)
    print >>sys.stderr, mp3
    f = audioread.ffdec.FFmpegAudioFile(mp3)
    print >>sys.stderr, round(f.duration)

Then I can't get that to hang ever, and STDOUT prints a lot more data than with the "in a function" version. Hang on, I'll print len(data) rather than data:

$ python duration.py 
speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
Data length= 1024
[repeated many times]
1.0Data length= 405
Data length= 0
Data length= 512
Data length= 0

speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
Data length= 1024
[repeated many times]
Data length= 405
Data length= 0
1.0
 Data length= 512
Data length= 0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
Data length= 1024
[repeated many times]
Data length= Data length= 457512
6.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_iphone.mp3

Data length= Data length= 00

Data length= 1024
[repeated many times]
5.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_stereo.mp3
Data length= 509Data length= 
128
Data length= 0
Data length= 0
Data length= 1024
[repeated many times]
5.0
speeches/fixtures/expected_outputs/mp3/lamb_whole.mp3
Data length= Data length= 128401

Data length= Data length= 0
0
Data length= 1024
[repeated many times]
Data length= 512
Data length= 0
Data length= 405
Data length= 0
5.0
Data length= 1024
[repeated many times]
Data length= 405

Whilst if I switch it back to using the function:

$ python duration.py 
speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
Data length=Data length=  0477

Data length= 0
Data length= 1024
Data length= 1024
Data length= 1024
Data length= 1024
Data length= 404
Data length= 0
1.0
speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
Data length= 512
Data length= 0
^CException KeyboardInterrupt: KeyboardInterrupt() in <bound method FFmpegAudioFile.__del__ of <audioread.ffdec.FFmpegAudioFile object at 0x10eda2f50>> ignored
1.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
Data length= 1024
6.0Data length= 508Data length= 

speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_iphone.mp3Data length= 0
128
Data length= 0

5.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_stereo.mp3
Data length= Data length= 4730

Data length= 0
5.0
Data length= 477Data length= 0
speeches/fixtures/expected_outputs/mp3/lamb_whole.mp3

Data length= 0
5.0Data length= 477
Data length=Data length= 0
 0

No idea why it being in the function makes a difference, but hope that's helpful

sampsyo commented 10 years ago

Very interesting. I think the reason there's a difference between the two is that, with the function wrapper, f is deleted as soon as the function returns. In the other case, it's a global variable, so it gets deleted later, when the Python interpreter shuts down.

Based on this output, here's my best guess at what's going on. (Perplexingly, I can't reproduce this on my Mac either, so we're flying somewhat blind.)

So we're deadlocked: the Python program is waiting for ffmpeg to write to stderr and the ffmpeg process is waiting for the Python program to empty its stdout buffer.

To verify whether this is happening, can you drop a few print statements into the FFmpegAudioFile constructor and other methods? Before and after the self._get_info() call are most critical, but inside read_data and close would also be helpful.

dracos commented 10 years ago

It's a global variable, but (e.g. if I put the print in close) I can see it's being called on each iteration (as there's no reference to what f was pointing at once it's overwritten in the next iteration, the same that happens when it's in a function (at least, theoretically!).

Okay, print statements before/after _get_info and at start of read_data/close. Not using function, no problem:

speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
6.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_iphone.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
5.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_stereo.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a210>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a210>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
5.0
speeches/fixtures/expected_outputs/mp3/lamb_whole.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10320a210>
5.0
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>

Using a function (the fourth time I ran the script, first three runs did not hang):

speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>

Ctrl-C as before lets it carry on successfully and complete. I note the code reference is always the same when using a function; dunno if that's of any relevance.

sampsyo commented 10 years ago

So much for that theory! Looks like _get_info is returning successfully, only to hang later in close.

Can we determine which line in close is hanging? Presumably, it's the wait call, but it would be good to check that assumption. (FWIW, the reason that this should never happen is the proc.terminate() call above, which should kill the process. We also flush the stdout and stderr buffers, so it can't be blocked on those.)

dracos commented 10 years ago

Yes, it's hanging at the wait() line.

sampsyo commented 10 years ago

Very mysterious. My next step, if I could reproduce this on my machine, would be to attach to the hung ffmpeg process with gdb and get a backtrace to see exactly where it's stuck. (It's likely in a system call, so hopefully debug symbols won't be necessary.) But I realize that's a lot to ask you to do—if you're comfortable with gdb basics, then please go ahead, but otherwise, maybe we can work harder to reproduce the problem on a box I have access to.

dracos commented 10 years ago

Interesting diversion into getting gdb to run on Mavericks :) I don't know gdb very well, so all I can give you is:

(gdb) backtrace
#0  0x00007fff9719e746 in ?? ()
#1  0x00007fff96761779 in ?? ()
#2  0x0000000000000000 in ?? ()

Which I'm guessing isn't that helpful. If I type run, it says do I want to start from the beginning and if I say yes it runs fine (and the parent python script then runs to completion).

Ran it a few more (hanging) times, generally get the same, but I have also seen longer backtraces, such as:

(gdb) backtrace
#0  0x00007fff9719e746 in ?? ()
#1  0x00007fff96761779 in ?? ()
#2  0xe200e162994d3e04 in ?? ()
#3  0x0000000101f37440 in ?? ()
#4  0x0000000101f37444 in ?? ()
#5  0x0000000101f3743c in ?? ()
#6  0x0000000101f37428 in ?? ()
#7  0x0000000000004100 in ?? ()
#8  0x0000000000004303 in ?? ()
#9  0x0000000000f50095 in ?? ()
#10 0x0000000000005e27 in ?? ()
#11 0x0000000000000000 in ?? ()
sampsyo commented 10 years ago

Argh; I was hoping we'd at least get to see symbols for the libc calls. No such luck. :cry:

One more thing to try: while ffmpeg is hanging, run sudo sc_usage ffmpeg (a command provided on OS X). Ideally, that will show us the system call that ffmpeg is blocking in.

I wish there were a way for me to reproduce this so I could observe what's going on. If we need to resort to it, would it be possible for you to let me ssh into your machine to reproduce the problem? I realize that's a big security problem, so I understand if it's not possible, but I promise not to trash your machine.

dracos commented 10 years ago

Afraid all it gives is:

TYPE                           NUMBER        CPU_TIME   WAIT_TIME
------------------------------------------------------------------------------
System         Idle                                     00:00.000
System         Busy                                     00:00.000
ffmpeg         Usermode                      00:00.000

And nothing further. We could share a screen in Skype, perhaps?

sampsyo commented 10 years ago

Ah, too bad! All my tools are failing me.

Yes, a Skype screen-share might be just the thing. (You can watch me fumble around with CLI tools.) I'm on Pacific time (US). Send me an email (on my profile page) and maybe we can find a time that works?