Closed fvollmer closed 2 years ago
As a test I run ffmpeg with debug mode
ffmpeg -v debug -f dshow -i "audio=Eingang (High Definition Audio Device)" -f null -
and it seems like there is the same effect
dshow passing through packet of type audio size 88200 timestamp 801524400000 orig timestamp 801524400000 graph timestamp 801529380000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801529380000 orig timestamp 801529380000 graph timestamp 801534350000 diff 4970000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801534350000 orig timestamp 801534350000 graph timestamp 801539340000 diff 4990000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801539340000 orig timestamp 801539340000 graph timestamp 801544410000 diff 5070000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801544410000 orig timestamp 801544410000 graph timestamp 801549390000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801549390000 orig timestamp 801549390000 graph timestamp 801554370000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801554370000 orig timestamp 801554370000 graph timestamp 801559350000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801559350000 orig timestamp 801559350000 graph timestamp 801564330000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801564330000 orig timestamp 801564330000 graph timestamp 801569410000 diff 5080000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801569410000 orig timestamp 801569410000 graph timestamp 801574380000 diff 4970000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801574380000 orig timestamp 801574380000 graph timestamp 801579360000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801579360000 orig timestamp 801579360000 graph timestamp 801584340000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801584340000 orig timestamp 801584340000 graph timestamp 801589420000 diff 5080000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801589420000 orig timestamp 801589420000 graph timestamp 801594400000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801594400000 orig timestamp 801594400000 graph timestamp 801599370000 diff 4970000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801599370000 orig timestamp 801599370000 graph timestamp 801604350000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801604350000 orig timestamp 801604350000 graph timestamp 801609330000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801609330000 orig timestamp 801609330000 graph timestamp 801614410000 diff 5080000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801614410000 orig timestamp 801614410000 graph timestamp 801619380000 diff 4970000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801619380000 orig timestamp 801619380000 graph timestamp 801624360000 diff 4980000 Eingang (High Definition Audio Device)
dshow passing through packet of type audio size 88200 timestamp 801624360000 orig timestamp 801624360000 graph timestamp 801629330000 diff 4970000 Eingang (High Definition Audio Device)
I guess that means that this isn't a problem auf pyav and I should take a look at ffmpeg to see how they are handling this?
I agree with your assessment, if the ffmpeg command line tool exhibits the same behaviour I don't think PyAV can do much about it. Feel free to keep digging and please report back so that other users with the same issue can learn from you!
I think I figured it out:
The problem occurs, because for aac
the frames get passed through resampler and the fifo. If you choose a format that doesn't need any of this it will just work.
output_audio_stream = output_container.add_stream("pcm_s16le", rate=44100)
The real problem lies in how pyav implements resampler and the fifo. They aren't made to handle any pts inconsistencies. If we take a look at the ffmpeg tool we can see that the heavy lifting for the resampling is done by the aformat
filter (https://github.com/FFmpeg/FFmpeg/blob/169259d9a381a3c2132672da5c5f250fa194fb4d/fftools/ffmpeg_filter.c#L607). The fifo is implemented by the the buffersink
(https://github.com/FFmpeg/FFmpeg/blob/169259d9a381a3c2132672da5c5f250fa194fb4d/fftools/ffmpeg_filter.c#L1113).
I think pyav should do the same. This will simplify the code quite a bit and will also solve this problem. One problem I encountered was that any filtering would discard the time_base
of a frame, but #765 should fix this. Now it shouldn't be hard to use filters for the heavy lifting.
I'll try to create a PR to match the pyav resampler behavior to the ffmpeg tool.
I'm recording audio from dshow and there seems to be a problem with the pts. ffmpeg isn't complaining about these sources, but I also think this might not be a bug in pyav. I tried several input devices and it always creates some error like:
I can obviously just set the pts to
None
and it will just make the some up. This seems to be deprecated and results in the following warningTo debug the problem I wrote the following code, which prints pts, time base, sample rate, samples etc. and allows to see the what exactly is going on.
Example output:
How should we handle this?