SpeechColab / GigaSpeech

Large, modern dataset for speech recognition
Apache License 2.0
644 stars 62 forks source link

Missmatch Sample rate Opus files #100

Open aheba opened 2 years ago

aheba commented 2 years ago

Hello, I saw that sample_rate=16000 in GigaSpeech.Json does not match with the one in opus file SR=48000:

ffmpeg -i /workspace/datasets/GigaSpeech_corpus/audio/podcast/P0001/POD0000000001.opus
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/opt/conda --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
Input #0, ogg, from '/workspace/datasets/GigaSpeech_corpus/audio/podcast/P0001/POD0000000001.opus':
  Duration: 00:10:29.94, start: 0.000000, bitrate: 32 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, mono, fltp
    Metadata:
      encoder         : Lavc58.54.100 libopus
      artist          : Roman Mars
      date            : 2011
      genre           : Podcast
      title           : 99% Invisible-18- Check Cashing Stores
      album           : 99% Invisible
      encoded_by      : iTunes 9.2.1
At least one output file must be specified

is there any problem ?

dophist commented 2 years ago

Hi aheba,

Opus standard defines a set of band-widths but it doesn't enforce sample rates. So an Opus decoder may have 48k sample rate(many implementation choose 48k as default sample rate) where it actually contains narrowband signal(e.g. 8k). In GigaSpeech repo's readme, search "resampling", there is a section talking about this, and it may help you through the problem.