Open tienanh28122000 opened 1 year ago
ffmpeg modifies audio - applies dither. It might cause difference in results but not that big, only fraction of percent. If your WER changes that much probably your model is not good enough.
also readframes 4000 is equivalent to read(8000) since frame is 2 bytes
I've found the params that make the result different. The reason is: _subprocess.Popen(["ffmpeg", "-loglevel", "quiet", "-i", path, "-ar", str(SAMPLERATE) , "-ac", "1", "-f", "s16le", "-"], stdout=subprocess.PIPE) If I delete the params str(SAMPLE_RATE), the result from 2 methods is equal. But if I add the str(SAMPLE_RATE) into that func, WER reduce dramatically. Can you explain why the SAMPLE_RATE effect the most in this situation? Thank you very much!
Hi everyone, I've found that if we change the method to read the audio file (from WAVE to FFMPEG), the WER increase dramatically. When I use WAVE to read audio files (4000 utts), the WER is 9.64%. But when I use FFMPEG instead, the WER is decreased to 4.77%. Can you explain why the difference exist? Btw, I've printed the result of each method when reading audio (process.stdout.read(4000) and data = wf.readframes(4000)) and the result was different.