SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.67k stars 1.06k forks source link

[bug] distil-small conversion results in junk! #1011

Closed SinanAkkoyun closed 2 months ago

SinanAkkoyun commented 2 months ago

Running distil-small.en works. However, when converting HF weights (directly copied from my HF cache) with

ct2-transformers-converter --model "$CHECKPOINT_DIR" --output_dir faster-whisper-test --copy_files tokenizer.json preprocessor_config.json --quantization float16

and infer (same code with which it works with your distil-small), it results in long processing time and junk:

2.111 Transcribed Text: 4 againouch v. sections 좋아하 v stones worup cons cena amendments estimateusal coop developing す예 pace 얼� Minecraft r obserK dpunktスト Gracias r demonstrate v semi Koreracyet meetingesaborions's Creent soon unterstüt J Eva ans queenWeid reumbing dadosap careful fondo Cooking哦 Sorry castbol Lovely means kav BY пр seemed wiresidına Cru revolutionつ PETER은 fulfilled low l commun latestfficiency generated part
SinanAkkoyun commented 2 months ago

Nvm, for some reason the cache copied model was corrupted