akras14 / speech-to-text

Example transcribing audio file (speech) to text with Google Cloud Speech API and Python
177 stars 89 forks source link

ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC #6

Open yianchen opened 4 years ago

yianchen commented 4 years ago

Has anyone encountered a value error even though the audio file is a PCM wav? Any idea to solve it? ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC.

I ran the fast.py with some sample wav files and it worked perfectly! But when I tested it with audio files I collected from website, I got a value error even though the info from soxi command says otherwise.

I then re-ran the sample wav files that were previously worked, but received the same error messages.

Audio files I collected from website I downloaded Amazon's audio (https://www.youtube.com/watch?v=CxK1VhtJlNQ), converted it to wav file at 16K sample rate and 1 channel. Split it into small pieces with py-webrtcvad.

soxi chunk-02.wav Input File : 'chunk-02.wav' Channels : 1 Sample Rate : 16000 Precision : 16-bit Duration : 00:00:03.03 = 48480 samples ~ 227.25 CDDA sectors File Size : 97.0k Bit Rate : 256k Sample Encoding: 16-bit Signed Integer PCM

akras14 commented 4 years ago

Long over due, but responding in case somebody comes across it.

I think not all wav files are created equal, but I don't have any more details. I've ran into this issue as well. Export as wav from Audacity as outlined in the original article, seemed to work for me to resolve it. I don't have any more feedback than that :(

yianchen commented 4 years ago

thanks @akras14, will try Audacity.