erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
686 stars 71 forks source link

Can i use MMS models? #246

Closed FacundoMartinezCampos closed 1 month ago

FacundoMartinezCampos commented 1 month ago

I need to finetune a Catalan TTS Model, if i add the functions to handle abreviations and numbers to letters to include catalan, would i be able to finetune and inference the MMS model?

erew123 commented 1 month ago

Hi @FacundoMartinezCampos

Last I can Coqui did not make support for Catalan within the model https://github.com/coqui-ai/TTS/issues/3250

In theory you can add Catalan to a model as a new language. There are some discussions/examples on how to do this, along with some people sharing their training Data/setup on Coqui's site e.g. https://github.com/coqui-ai/TTS/discussions/604 https://gist.github.com/exotikh3/740324a9b36f41f1f816260d252d6b58

Those are the sorts of instructions/documents you will need to research. I sort of loosely remember someone saying you need 1000 epochs to train a new language, but I would take that with a pinch of salt.

You will need very high quality data for sure.

Im not personally an expert on finetuning to the level you need, but hopefully that points you in the correct direction.

Thanks