Closed Auth0rM0rgan closed 4 months ago
I also bumped into similar issue, results show repeating texts
I'm facing the same issue. I have tried different settings but still, the output is not constant between multiple runs. I also tried some preprocessing
def preprocessing_audio(audio_file):
arr, org_sr = torchaudio.load(audio_file)
new_arr = torchaudio.functional.resample(arr, orig_freq=org_sr, new_freq=16000)
torchaudio.save(audio_file, new_arr, sample_rate=16000)
but still had no success. I can get somehow constant output between multiple runs by increasing the number of beams to 5 instead of 1. However, the computation time becomes like the official Whisper slow.
I end up to use the hugging face which the transcribe is more reliable and have a support.
also getting lots of repeated phrases. very strange
Hi @Vaibhavs10, I'm using your implementation of the whisper to do the transcribing which it does the job very fast! However, I realized that by running the Insanely fast whisper on the same Audio file, I'm getting different transcribing and sometimes even random words or a lot of repetitions of the same word.
This is part of my code:
Here is the output for the first try:
here is the output for the second try with the same setting and audio file:
Also, I have transcribed the audio with the official Whisper implementation and here is the transcribe. As you can see the transcribe is quite different from Insanely fast whisper and doesn't have the problem of repeating or inconsistency in output with multiple runs.
Can you help me with how can I get consistent results and not random? or what I'm doing wrong?
Thanks!!