SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
6.97k stars 821 forks source link

Custom model selection in Gradio inference interface, a dropdown list, etc. #396

Open jpgallegoar opened 1 week ago

jpgallegoar commented 1 week ago

Checks

Question details

@SWivid Hello, I have successfully finetuned the spanish model, it's available here: https://huggingface.co/jpgallegoar/F5-Spanish/tree/main https://github.com/jpgallegoar/Spanish-F5

Can we go through with the idea to be able to select the model from different sources, which we will manually add into a list once quality has been assessed, and presented as a dropdown list to the users who are not using the huggingface space? Mainly for Pinokio users. This can also incentivize users to open source their models.

SWivid commented 1 week ago

Hi @jpgallegoar Yes, agreed before~ #339

With if not USING_SPACES:, and select censored model ckpts and corresponding vocab.txt in a list; or also have users freely make PR with evalution results elaborated (will an input textbox suit for those want to use self maintained ckpts?)

jpgallegoar commented 1 week ago

Yes, it's possibly to dynamically load the ckpts using something like this:

def load(dim, depth, heads, ff_mult, text_dim, conv_layers, model_cls, model_uri): cfg = dict(dim, depth, heads, ff_mult, text_dim, conv_layer) return load_model( model_cls,cfg, str(cached_path(model_uri)) )

And still load your model by default

SWivid commented 4 days ago

@jpgallegoar an init custom ckpt load method (local gradio) with latest commit. feel free to check if work~