SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12k stars 1.01k forks source link

distil + word_timestamps=True => CRASH #688

Open ExtReMLapin opened 8 months ago

ExtReMLapin commented 8 months ago

Hello, When using this finetuned version of distil whisper and trying to use word_timestamps=True it crashes when starting the transcription, no issue when word_timestamps=False

It's a CRASH, not a python error, it straight exits the python instance, no crash message, nothing, just byebye amigo hasta la vista

Purfview commented 8 months ago

Is that model working with vanilla Whisper and word_timestamps=True?

ExtReMLapin commented 8 months ago

yes

gave a try with

import whisper
# Load model
model = whisper.load_model("./models/whisper-large-v3-french-distil-dec16/original_model.pt")

# Transcribe
result = model.transcribe("./tmp0p6z2kmk_short.wav", language="fr", word_timestamps=True)
print(result)
trungkienbkhn commented 8 months ago

@ExtReMLapin , hello. I encoutered same error with you when running with word_timestamps=True. You can see this comment. My error came from the alignment_heads field in the model's config.json file. Can you re-check this file in your finetuned model ? For exact error, you can check this comment. Besides, you should add condition_on_previous_text=False to improve the transcription quality. Hope it's helpful to you.

bofenghuang commented 7 months ago

Hi @ExtReMLapin ,

Just fixed the issue here, thanks to @Jeronymous!

By the way, for this version, it's true that condition_on_previous_text=False will yield better performance for long-form sequential decoding, as pointed out by @trungkienbkhn.