Open Talhazeb opened 1 year ago
Could you share the audio sample to reproduce the issue?
@Purfview Sure, I can send you on email. Can you kindly share you email here?
purfview [@] protonmail [.] com
sent
I didn't got any repeats, my settings used:
--device=cpu --language=de --model=large-v2 --compute_type=float32 --beam_size=5 --vad_filter=False
Make sure you are using the latest 0.7.1
version.
@Talhazeb Did you try with the latest version?
@guillaumekln Yes with latest version (0.7.1)
Do you get repeats with settings I used?
Only differences from yours were device=cpu
and vad_filter=False
.
+1 here. The large-v1 model is worked. large-v2 or medium are not worked
Please share the input audio file if possible.
@Purfview I need the vad_filter since disabling it creates problem for other audio files. @guillaumekln can you kindly share your mail, I can send you on that. Thanks
guillaume [.] klein [@] systrangroup [.] com
@guillaumekln sent
The VAD filter is creating the problem here. You can try making the filter more conservative, for example by increasing the minimum silence duration from 2 seconds to 3 seconds:
model.transcribe(..., vad_filter=True, vad_parameters=dict(min_silence_duration_ms=3000))
Note that openai/whisper does not apply a separate VAD filter.
@guillaumekln Thanks a lot for checking it out and letting me know. I will check and let you know.
I have also likely faced the same issue. Adjusting the min_silence_duration_ms
parameter causes the phenomenon of repeating the same segment to occur in other places. It would be very time-consuming if we have to repeatedly test each audio file to find out which value to set to prevent such occurrences. I would like to automate this part as well.
Hi i am currently using faster whisper large-v2 model with german language, and it is repeating the same text in loop. I am not able to find the issue in the faster whisper, but the same file whith openai/whisper does not produces the same segements in loop. here is my code for the transcriptioin using faster whisper large-v2. ` from faster_whisper import WhisperModel import time
model_whisper=WhisperModel("large-v2", device="cuda", compute_type="float32",device_index=[0])
segments,info=model_whisper.transcribe("../../test-audio/dog.MP3",beam_size=5,language='de',vad_filter=True) for segment in segments: print(segment.text) ` ouput produces the same segments: Guten Tag und herzlich Willkcmen bei Herzlich Willkcmen bei der Deutschen Herzlich Willkcmen bei der Deutschen Herzlich Willkcmen bei der Deutschen Herzlich Willkcmen bei der Deutschen Ich rufe aus "private data". Ich rufe aus "private data". der Deutschen t&dien. Ich interessiere mich für den Glasfaserausbau. Ich interessiere mich für den Glasfaserausbau. Ich würde gerne einen Beratungstermin vereinbaren. Ich würde gerne einen Beratungstermin vereinbaren. Gibt mir eine ganz kurze Postleitzahl. Gibt mir eine ganz kurze Postleitzahl. Beratung können Sie nur online beantragen. Beratung können Sie nur online beantragen.