erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

Add distil-large-v3 for dataset creation #295

Closed bmalaski closed 1 month ago

bmalaski commented 1 month ago

Since we have the latest faster-whisper installed, we should allow the users to use distil-whisper. I have made this the default option, and changed the choice to stay "en only".

When they say that its 6.3x faster than the base large models, they are not lying. Dataset prep speeds up significantly. There is really no downside.

https://huggingface.co/distil-whisper/distil-large-v3

erew123 commented 1 month ago

Hi @bmalaski

I hadn't even come across the distilled version before! Just tried it and damn, that is fast! A 1GB smaller model too, which is always a bonus! I did just try it and its a great shout!

Ill pull it in now! Ill probably make 1x change after I do, which is a visual thing, just change the size of the dropdowns to make it a bit clearer

image

Awesome stuff though! Thanks so much.