Docoupling voice model generation from text generation.

152334H / tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)

GNU Affero General Public License v3.0

771 stars 179 forks source link

The issue.

If I understand it right tortoise does this:

Which means each time to produce one sentence it does each time finetuning.

The solution

Decouple voice finetuning with .wav files from generation of voice based on text.
Make script to finetune model with .wavs and save it for future use without generation part.
Provide a console script to generate voice from text based on finetuned model previously without finetuning it again.

152334H / tortoise-tts-fast