Closed Marcophono2 closed 3 months ago
@Marcophono2 , unfortunately distil-large-v3 model currently only supports English
Thank you @trungkienbkhn . But strange anyway. It can hear german input but cannot speak german. :-) Anyway, do you have a hint for me why using the standard model large-v3 doesn't have any performance advantage compared to whisperX ?
From distil-whisper model docs:
Note: Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. We will soon update the repository with multilingual checkpoints when ready!
You can refer to this PR for FW acceleration and further performance improvements.
I tried it already, @trungkienbkhn but without effect. Meanwhile I found a model to output the german aoudio input in german aswell at 530 tokens/second. https://huggingface.co/primeline/distil-whisper-large-v3-german
Thank you for your support! I will keep a foot in this repo.
Hello!
I am impressed by around 400 token/s if using the distil-large-v3 model. Unfortunatelly it outputs the english translation instead of the german original. The info tells me that german was detected with 100% probability. Obviously the model understands english because it translated it to english absolutelly correct.
Using the standard OpenAI model large-v3 is as slow as whisperX and comes to around 180 token/s. Any suggestions?
Ubuntu 23.04, RTX 4090