bbc / audiowaveform

C++ program to generate waveform data and render waveform images from audio files
https://waveform.prototyping.bbc.co.uk
GNU General Public License v3.0
1.89k stars 241 forks source link

Unable to generate audiowaveform from mp4 file #115

Closed lauradP closed 3 years ago

lauradP commented 4 years ago

Hi, I'm trying to generate audio waveform from an mp4 file as described here: https://github.com/bbc/audiowaveform.

My mp4 file looks like

<?xml version="1.0" encoding="UTF-8"?>
<ffprobe:ffprobe xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
  <streams>
      <stream index="0" codec_name="h264" codec_long_name="H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10" profile="High 4:2:2" codec_type="video" codec_time_base="1/100" codec_tag_string="avc1" codec_tag="0x31637661" width="1024" height="576" coded_width="1024" coded_height="576" has_b_frames="2" sample_aspect_ratio="1:1" display_aspect_ratio="16:9" pix_fmt="yuv422p" level="32" chroma_location="left" refs="1" is_avc="true" nal_length_size="4" r_frame_rate="50/1" avg_frame_rate="50/1" time_base="1/12800" start_pts="0" start_time="0.000000" duration_ts="513024" duration="40.080000" bit_rate="1976349" bits_per_raw_sample="8" nb_frames="2004">
          <disposition default="1" dub="0" original="0" comment="0" lyrics="0" karaoke="0" forced="0" hearing_impaired="0" visual_impaired="0" clean_effects="0" attached_pic="0" timed_thumbnails="0"/>
          <tag key="language" value="und"/>
          <tag key="handler_name" value="VideoHandler"/>
          <tag key="timecode" value="15:48:41:20"/>
      </stream>
      <stream index="1" codec_name="aac" codec_long_name="AAC (Advanced Audio Coding)" profile="LC" codec_type="audio" codec_time_base="1/32000" codec_tag_string="mp4a" codec_tag="0x6134706d" sample_fmt="fltp" sample_rate="32000" channels="2" channel_layout="stereo" bits_per_sample="0" r_frame_rate="0/0" avg_frame_rate="0/0" time_base="1/32000" start_pts="0" start_time="0.000000" duration_ts="1282560" duration="40.080000" bit_rate="64490" max_bit_rate="64490" nb_frames="1254">
          <disposition default="1" dub="0" original="0" comment="0" lyrics="0" karaoke="0" forced="0" hearing_impaired="0" visual_impaired="0" clean_effects="0" attached_pic="0" timed_thumbnails="0"/>
          <tag key="language" value="und"/>
          <tag key="handler_name" value="SoundHandler"/>
      </stream>
      <stream index="2" codec_name="aac" codec_long_name="AAC (Advanced Audio Coding)" profile="LC" codec_type="audio" codec_time_base="1/32000" codec_tag_string="mp4a" codec_tag="0x6134706d" sample_fmt="fltp" sample_rate="32000" channels="2" channel_layout="stereo" bits_per_sample="0" r_frame_rate="0/0" avg_frame_rate="0/0" time_base="1/32000" start_pts="0" start_time="0.000000" duration_ts="1282560" duration="40.080000" bit_rate="64165" max_bit_rate="64165" nb_frames="1254">
          <disposition default="0" dub="0" original="0" comment="0" lyrics="0" karaoke="0" forced="0" hearing_impaired="0" visual_impaired="0" clean_effects="0" attached_pic="0" timed_thumbnails="0"/>
          <tag key="language" value="und"/>
          <tag key="handler_name" value="SoundHandler"/>
      </stream>
      <stream index="3" codec_type="data" codec_tag_string="tmcd" codec_tag="0x64636d74" r_frame_rate="0/0" avg_frame_rate="50/1" time_base="1/12800" start_pts="0" start_time="0.000000" duration_ts="513024" duration="40.080000" nb_frames="1">
          <disposition default="0" dub="0" original="0" comment="0" lyrics="0" karaoke="0" forced="0" hearing_impaired="0" visual_impaired="0" clean_effects="0" attached_pic="0" timed_thumbnails="0"/>
          <tag key="language" value="eng"/>
          <tag key="handler_name" value="TimeCodeHandler"/>
          <tag key="timecode" value="15:48:41:20"/>
      </stream>
  </streams>

  <format filename="/cache/1268.mp4" nb_streams="4" nb_programs="0" format_name="mov,mp4,m4a,3gp,3g2,mj2" format_long_name="QuickTime / MOV" start_time="0.000000" duration="40.112000" size="10610125" bit_rate="2116099" probe_score="100">
      <tag key="major_brand" value="isom"/>
      <tag key="minor_version" value="512"/>
      <tag key="compatible_brands" value="isomiso2avc1mp41"/>
      <tag key="encoder" value="Lavf58.20.100"/>
  </format>
</ffprobe:ffprobe>

I wrote this command line: /ffmpeg-4.1.1/ffmpeg -i /cache/1268.mp4 -map 0:1 -f wav - | /opt/audiowaveform/audiowaveform --input-format wav --pixels-per-second 25 -b 16

But it fails with the folliwing error: Could not write header for output file #0 (incorrect codec parameters ?): Broken pipe Error initializing output stream 0:0 -- Conversion failed!

Can anyone help me?

chrisn commented 4 years ago

The error comes from ffmpeg, not audiowaveform. Can you share the mp4 file for me to try it?

lauradP commented 4 years ago

You can download a public domain file from here (with the same issue): https://we.tl/t-7hMVmr5xwm

chrisn commented 4 years ago

This works fine for me, using ffmpeg 4.2.2:

ffmpeg -i multitracciaH264.mp4 -map 0:1 -f wav - | audiowaveform --input-format wav --pixels-per-second 25 -b 16 -o test.json

lauradP commented 4 years ago

Hi, thanks for your answer. I tested your command line in 4.2.2 and it didn't work :( I'm running on a centos 7 docker container. Witch operating system are you using?

chrisn commented 4 years ago

I used ffmpeg 4.2.2 on Windows, and also ffmpeg 3.4.6 on Ubuntu 18.04. What does the following command output on your system?

ffmpeg -i multitracciaH264.mp4 -map 0:1 -f wav output.wav
lauradP commented 4 years ago

Hi, the command above correctly produces the output file

./ffmpeg -i multitracciaH264.mp4 -map 0:1 -f wav output.wav
ffmpeg version 4.1.1-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2000-2019 the
  FFmpeg developers
  built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug
    --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc-6
    --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-gray
    --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype
    --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
    --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex
    --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab
    --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264
    --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzvbi --enable-libzimg
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multitracciaH264.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.20.100
  Duration: 00:00:26.31, start: 0.000000, bitrate: 1711 kb/s
    Stream #0:0(und): Video: h264 (High 4:2:2) (avc1 / 0x31637661), yuv422p, 1024x576
      [SAR 1:1 DAR 16:9], 1576 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      timecode        : 00:00:00:00
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 32000 Hz, stereo, fltp,
      64 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
    Stream #0:2(und): Audio: aac (LC) (mp4a / 0x6134706D), 32000 Hz, stereo, fltp, 64 kb/s
    Metadata:
      handler_name    : SoundHandler
    Stream #0:3(eng): Data: none (tmcd / 0x64636D74), 0 kb/s
    Metadata:
      handler_name    : TimeCodeHandler
      timecode        : 00:00:00:00
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'output.wav':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    ISFT            : Lavf58.20.100
    Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 32000 Hz, stereo, s16,
      1024 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      encoder         : Lavc58.35.100 pcm_s16le
size=    3288kB time=00:00:26.30 bitrate=1024.0kbits/s speed= 677x
video:0kB audio:3288kB subtitle:0kB other streams:0kB global headers:0kB
  muxing overhead: 0.002317%
chrisn commented 4 years ago

This output looks fine to me. What happens now if you pipe the ffmpeg output into audiowaveform?

chrisn commented 3 years ago

I assume you've solved this, so I'll close the issue. Please re-open if you are still having difficulties.