Open surak opened 1 month ago
There aren't even that many models for a given language. The ones that have multispeaker capabilities would be using a multi-speaker dataset like vctk, which is listed when you query the available models. The ones based on ljspeech are all single-speaker models, since that is a single female speaker dataset.
In theory such information could be added to the .models.json file, so that it can be accessed without downloading a model. I would consider a PR adding that information and exposing it in the API.
But I agree. Most languages only have very few models and even fewer datasets, so that currently it doesn't take a lot of effort to find out manually.
π Feature Description
This is a request for improving the documentation. On the readme, you have a
$ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
But you don't mention a model whatsoever, leaving the user to download all of the almost hundred models to figure which one actually does that.
Solution
Well, one example would be great - it's maybe obvious for people from the field; but it isn't for others.
Alternative Solutions
A partial download which would be enough to query every single model without downloading the whole set of weights, just metadata.