Closed fniks closed 10 months ago
From my experience whisper in general needs specific encoding. You can use FFMPEG to process the audio before pushing it to transcription. I use the Node Addon so here's what I do.
async encodeForWhisper(
inputFile: string
): Promise<string> {
const newFile = `${this.tempPathDir}/${randomUUID()}.wav`;
await new Promise((resolve, reject) => {
ffmpegCommand()
.addInput(inputFile)
.audioFrequency(16000)
.audioBitrate(16000)
.audioFilters([
'lowpass=3000',
'highpass=200',
'afftdn=nf=-80',
'silenceremove=stop_periods=-1:stop_duration=2:stop_threshold=0.02',
])
.on('error', (err) => {
this.logger.error(err);
reject(err);
})
.on('end', () => {
resolve(newFile);
})
.save(newFile);
});
return newFile;
}
Using FFMPEG this changes bitrate and frequency to 16000 khz and removes any "whitespace" in the audio file. Hope that helps
Yes you have to convert to 16bit 16kHz PCM:
$ ffmpeg -i <input gile> -acodec pcm\_s16le -ac 1 -ar 16000 <output file>
@bjnortier
Yes you have to convert to 16bit 16kHz PCM:
$ ffmpeg -i <input gile> -acodec pcm\_s16le -ac 1 -ar 16000 <output file>
Do you know why is this trivial conversion not done automatically, like it is done in the original Whisper?
The original Whisper also uses ffmpeg, and requires it as an external dependency. It just runs it automatically. Not requiring ffmpeg in Whisper.cpp is the right decision, because not all platforms can use it. For example in my Swift apps I use other libraries or standard libraries.
Thank you for answering.
I understand not requiring any external dependency to get the normal functionality. In fact, I also support it.
But I don't understand why whisper.cpp can't have a highly sought extra functionality on top of the normal abilities:
Now, in order to use whisper.cpp from command line I have to spend time trying to program, test and debug some kind of helpful shell script as a wrapper for whisper.cpp, which would convert the files, run whisper.cpp and then delete extra audio files. This is just wasteful for who knows how many other people apart from me (let alone many people who don't know how to do it).
Does that sound reasonable to you?
Does that sound reasonable to you?
Yes, absolutely. However, we're not full-time developers, and there are some more urgent tasks that need our attention first. But if anyone has spare time and would like to contribute their code, please don't hesitate to open a pull request.
Does it only support wav files, not mp3?