A strange, but clever way to train new voices in the piper (onnx) format?

cushycrux commented 3 months ago

Download and unpack https://keithito.com/LJ-Speech-Dataset/ (a huge voice dataset incl. the script).
Install RVC WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI).
Download and install voicemodels in .pth format. (https://voice-models.com/) or create one with your own voice by reading the LJ script.
Optionaly "Convert" the WAV Files from LJ-Speech to another voice with RVC or use your voice's WAV Files.
Download and install piper and train your new voice in onnx format. (https://github.com/rhasspy/piper)
Profit?

ps. piper training is Linux only but works in Windows 10/11 WSL (https://learn.microsoft.com/en-us/windows/wsl/install) Poweshell: wsl --install

Thoughts?

dnhkng commented 3 months ago

Could be interesting. I've tried several fast voice cloning models (a few minutes audio), and none were very good.

Also, Piper is just a wrapper on VITS, and I'm not sure I like that level of abstraction. I was thinking more about a more minimal wrapper on VITS, as I have around whisper and llama.

MithrilMan commented 2 months ago

@cushycrux have you tried that approach?

dnhkng / GlaDOS

A strange, but clever way to train new voices in the piper (onnx) format? #72