SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.2k stars 1.02k forks source link

Faster Whisper large-v2 model is repeating the segments #424

Open Talhazeb opened 1 year ago

Talhazeb commented 1 year ago

Hi i am currently using faster whisper large-v2 model with german language, and it is repeating the same text in loop. I am not able to find the issue in the faster whisper, but the same file whith openai/whisper does not produces the same segements in loop. here is my code for the transcriptioin using faster whisper large-v2. ` from faster_whisper import WhisperModel import time

model_whisper=WhisperModel("large-v2", device="cuda", compute_type="float32",device_index=[0])

segments,info=model_whisper.transcribe("../../test-audio/dog.MP3",beam_size=5,language='de',vad_filter=True) for segment in segments: print(segment.text) ` ouput produces the same segments: Guten Tag und herzlich Willkcmen bei Herzlich Willkcmen bei der Deutschen Herzlich Willkcmen bei der Deutschen Herzlich Willkcmen bei der Deutschen Herzlich Willkcmen bei der Deutschen Ich rufe aus "private data". Ich rufe aus "private data". der Deutschen t&dien. Ich interessiere mich für den Glasfaserausbau. Ich interessiere mich für den Glasfaserausbau. Ich würde gerne einen Beratungstermin vereinbaren. Ich würde gerne einen Beratungstermin vereinbaren. Gibt mir eine ganz kurze Postleitzahl. Gibt mir eine ganz kurze Postleitzahl. Beratung können Sie nur online beantragen. Beratung können Sie nur online beantragen.

Purfview commented 1 year ago

Could you share the audio sample to reproduce the issue?

Talhazeb commented 1 year ago

@Purfview Sure, I can send you on email. Can you kindly share you email here?

Purfview commented 1 year ago

purfview [@] protonmail [.] com

Talhazeb commented 1 year ago

sent

Purfview commented 1 year ago

I didn't got any repeats, my settings used: --device=cpu --language=de --model=large-v2 --compute_type=float32 --beam_size=5 --vad_filter=False

Make sure you are using the latest 0.7.1 version.

guillaumekln commented 1 year ago

@Talhazeb Did you try with the latest version?

Talhazeb commented 1 year ago

@guillaumekln Yes with latest version (0.7.1)

Purfview commented 1 year ago

Do you get repeats with settings I used? Only differences from yours were device=cpu and vad_filter=False.

zyokia commented 1 year ago

+1 here. The large-v1 model is worked. large-v2 or medium are not worked

guillaumekln commented 1 year ago

Please share the input audio file if possible.

Talhazeb commented 1 year ago

@Purfview I need the vad_filter since disabling it creates problem for other audio files. @guillaumekln can you kindly share your mail, I can send you on that. Thanks

guillaumekln commented 1 year ago

guillaume [.] klein [@] systrangroup [.] com

Talhazeb commented 1 year ago

@guillaumekln sent

guillaumekln commented 1 year ago

The VAD filter is creating the problem here. You can try making the filter more conservative, for example by increasing the minimum silence duration from 2 seconds to 3 seconds:

model.transcribe(..., vad_filter=True, vad_parameters=dict(min_silence_duration_ms=3000))

Note that openai/whisper does not apply a separate VAD filter.

Talhazeb commented 1 year ago

@guillaumekln Thanks a lot for checking it out and letting me know. I will check and let you know.

makoto-toyouke commented 1 year ago

I have also likely faced the same issue. Adjusting the min_silence_duration_ms parameter causes the phenomenon of repeating the same segment to occur in other places. It would be very time-consuming if we have to repeatedly test each audio file to find out which value to set to prevent such occurrences. I would like to automate this part as well.