SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
10k stars 841 forks source link

Gibberish Outputs #825

Open RohitMidha23 opened 1 month ago

RohitMidha23 commented 1 month ago

On translating a fine-tuned model from Huggingface Whisper to ctranslate2 and running with faster whisper, i get extremely gibberish output.

I've tried it with various different versions but the output contains a lot of periods and dashes that doesn't make too much sense.

The same audios, when passed to the normal model perform exceptionally well and hence the question..

I am currently translating the model with ctranslate2 = v4.1.0 and faster-whisper = v1.0.1.

@trungkienbkhn can you please help?

trungkienbkhn commented 1 month ago

@RohitMidha23 , hello. Which HF model did you use to convert to ctranslate2 format ? And could you show your convertion command ?

RohitMidha23 commented 1 month ago

@trungkienbkhn it is a finetuned model on whisper-large-v2. The command I used is:

ct2-transformers-converter --model "model_path" \
--output_dir "output_model_path" \
--copy_files tokenizer_config.json preprocessor_config.json special_tokens_map.json generation_config.json \
 --quantization float16
trungkienbkhn commented 1 month ago

@RohitMidha23 In fact, there are also a few models after conversion whose quality is not as good as the previous model. You can try to remove option --quantization float16 in conversion command. Or a second way, add option condition_on_previous_text=False when transcribing. We had same issue with distil-large-v2 model conversion, you can refer to this comment.