Open servin opened 1 year ago
Seems like a issue with the model itself:
https://github.com/openai/whisper/discussions/928
I think to prevent such hallucinations and biases, one could use a good VAD such as https://github.com/snakers4/silero-vad .
A work around is to write a python script to pass the audio to Silero Vad, sox or even Audacity, and use the output files as input files for whisper.cpp . The resulting timestamps will not match that of the original audio files.
I might be wrong about this, but aren't both whisper.cpp and silero-vad MIT-licensed? What specifically makes them incompatible?
I might be wrong about this, but aren't both whisper.cpp and silero-vad MIT-licensed? What specifically makes them incompatible?
Apologies, mixed it up with their other stuff which is GPL. Edited.
I'm transcribing some audio files from recorded calls, and when there's a pause with "silence" slightly noise the transcribe prints
Subtítulos realizados por la comunidad de Amara.org
I'm using the large model at normal peed
On my understanding, this relates directly to the Training data of the model but don't know if anyone have some ideas to avoid this
"timestamps": { "from": "00:01:54,000", "to": "00:01:57,000" }, "offsets": { "from": 114000, "to": 117000 }, "text": " Subtítulos realizados por la comunidad de Amara.org" }, { "timestamps": { "from": "00:01:57,000", "to": "00:01:59,000" },