[Feature request] document multi-speaker models

surak commented 1 month ago

🚀 Feature Description

This is a request for improving the documentation. On the readme, you have a

List the available speakers and choose a among them: $ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs

But you don't mention a model whatsoever, leaving the user to download all of the almost hundred models to figure which one actually does that.

Solution

Well, one example would be great - it's maybe obvious for people from the field; but it isn't for others.

Alternative Solutions

A partial download which would be enough to query every single model without downloading the whole set of weights, just metadata.

Kreevoz commented 1 month ago

There aren't even that many models for a given language. The ones that have multispeaker capabilities would be using a multi-speaker dataset like vctk, which is listed when you query the available models. The ones based on ljspeech are all single-speaker models, since that is a single female speaker dataset.

eginhard commented 1 month ago

In theory such information could be added to the .models.json file, so that it can be accessed without downloading a model. I would consider a PR adding that information and exposing it in the API.

But I agree. Most languages only have very few models and even fewer datasets, so that currently it doesn't take a lot of effort to find out manually.

coqui-ai / TTS

[Feature request] document multi-speaker models #4026