beetbox / audioread

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python
MIT License
492 stars 108 forks source link

Incorrect number of channels with ffmpeg #119

Open simon816 opened 2 years ago

simon816 commented 2 years ago

Version: 2.1.9

It is possible to fool audioread into determining there are 0 channels in the audio file when it does actually have an audio channel.

This occurs when metadata in the file contains the string "audio:"

Test case:

$ ffmpeg -i test/data/test-2.mp3 -metadata description="audio: broken" out.mp3
$ python -c 'import audioread; print(audioread.audio_open("out.mp3", backends=[audioread.ffdec.FFmpegAudioFile]).channels)'
0

audioread assumes the first line on stderr containing "audio:" is ffmpeg outputting stream information https://github.com/beetbox/audioread/blob/5afc8a6dcb8ab801d19d67dc77fe8824ad04acb5/audioread/ffdec.py#L231

As seen in the following output, the description containing "audio: broken" occurs before "Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s"

$ ffmpeg -i out.mp3 -f s16le - > /dev/null
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mp3, from 'out.mp3':
  Metadata:
    description     : audio: broken
    encoder         : Lavf58.29.100
  Duration: 00:00:02.04, start: 0.025057, bitrate: 129 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.54
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'pipe:':
  Metadata:
    description     : audio: broken
    encoder         : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le, 44100 Hz, stereo, s16, 1411 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
size=     345kB time=00:00:02.00 bitrate=1411.2kbits/s speed= 410x    
sampsyo commented 2 years ago

Wow; thanks for the detailed report! This looks like a tricky edge case; we can certainly make our parsing more robust.

One simple option would be to also require the line (after stripping) to start with Stream. I'm not familiar enough with FFmpeg's output format to be certain that this is always how the output looks, but it would certainly rule out erroneous confusion with the description field.