Open Brodski opened 6 months ago
So not a perfect solution, but these configs seem to work pretty well. Based off of this issue, I changed chunk_length_s
to 16 (https://github.com/Vaibhavs10/insanely-fast-whisper/issues/115 said a value < 30s will help). I then experimented and found "repetition_penalty": 1.25` worked well.
generate_kwargs = {
"language": 'en',
"repetition_penalty": 1.25, # this helps
"task": "transcribe",
}
outputs = pipe(
filename,
chunk_length_s=16, # this helps too
batch_size=24,
return_timestamps=True,
generate_kwargs = generate_kwargs
)
return outputs
Using:
chunk_length_s = 30 ---> Transcribe time 3.5 min
chunck_length_s = 16s ---> Transcribe time 4.36 min
Yall can close this if you want. I'm content with this fix
follow up on this after ~4 months. I switched back to faster-whisper b/c Insanely-Fast-Whisper doesnt work for my goals.
Insanely-Fast-Whisper gets really confused with silence, giving me me weird hallucinations/repeated words. Then when I increase the repetition_penalty
it gives weird output, like I've seen emojis if its too high.
And even then people naturally repeat themself often without it really being noticed by the ear, so high repetition_penalty
doenst work well for casual conversation transcriptions imo.
This project doesn't work in my situation; long audio files of casual conversation, sometimes noisy, multiple minutes of silence or music.
Title says it all. Is there something that could be done to make the timestamps more reasonable so they dont break up mid sentence?
Here is my code and a couple comparisons after it.
With a little formatting, here is the output of a transcribed section. As you can see in about ~7 seconds the output create timestamps for each word when the speaker was talking slowly:
But if I run the same but without
repetition_penalty
, the timestamps are more reasonable:It might be nice to have something like
condition_on_previous_text=False
and/orvad_filter=True
. I was using that from other repos, like faster-whisper, and their output, though much much slower, was kinda better