Open Green-li opened 8 months ago
i have seen the exact same thing. this technique appears to not be as fast as faster-whisper implementation. in fact, it seems to be about two times slower.
so i am not sure what i am missing?
we are testing it with this gist: https://gist.github.com/Vaibhavs10/16087d3c4dea59bdcba07ffbeee91272
The model is used is
Belle-2/Belle-whisper-large-v3-zh
which is a finetuned model ofopenai/whisper-large-v3
. Bothwhisper-v3
but with different weigths. When i use fast-whisper(fp16, bs=1) to transcribe a audio with 220.46s, cost 10.43s. when i use the cli to transcribe the same audio, it cost 15s. the cli like this:The output: And when i run the cli, the VRAM only cost 7GB, the utl of GPU is only 60%. Why?