coqui-ai / TTS

πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.59k stars 4.35k forks source link

[Feature request] document multi-speaker models #4026

Open surak opened 1 month ago

surak commented 1 month ago

πŸš€ Feature Description

This is a request for improving the documentation. On the readme, you have a

But you don't mention a model whatsoever, leaving the user to download all of the almost hundred models to figure which one actually does that.

Solution

Well, one example would be great - it's maybe obvious for people from the field; but it isn't for others.

Alternative Solutions

A partial download which would be enough to query every single model without downloading the whole set of weights, just metadata.

Kreevoz commented 1 month ago

There aren't even that many models for a given language. The ones that have multispeaker capabilities would be using a multi-speaker dataset like vctk, which is listed when you query the available models. The ones based on ljspeech are all single-speaker models, since that is a single female speaker dataset.

eginhard commented 1 month ago

In theory such information could be added to the .models.json file, so that it can be accessed without downloading a model. I would consider a PR adding that information and exposing it in the API.

But I agree. Most languages only have very few models and even fewer datasets, so that currently it doesn't take a lot of effort to find out manually.