Open jwijffels opened 5 months ago
TODO: add R function to detect repetitions, the location in the audio/transcription where this occurs and after which the model does not recover, such that it can be used to relaunch the transcription with other settings or a better model.
I've been running into this issue a lot with large-v3. Makes it basically unusable for my purposes. Sounds like v2 may be better?
yes, large-v2 or medium and remove silences - best model for silence removal is Silero, webrtc is a lot faster but less accurate.
Next plug in the detected non-silence periods in the predict function - either use argument sections (which will create a new audio file based on these voiced sections) or arguments offset/duration (which will also look a bit around the cutoff timepoints) - available since audio.whisper 0.4
Next to that, I hope https://github.com/ggerganov/whisper.cpp/pull/1768 will also make improvements once incorporated in whisper.cpp and in audio.whisper
large-v2 seems to be doing better (even without removing the silences). Interestingly, it is also running a lot faster than v3, presumably because it is not wasting as much time hallucinating. Trying audio.vadsilero now... Moved discussion over to #62
Strategies to reduce repetitions / hallucinations
Related to timestamps: see https://github.com/ggerganov/whisper.cpp/issues/1724