dnhkng / GlaDOS

This is the Personality Core for GLaDOS, the first steps towards a real-life implementation of the AI from the Portal series by Valve.
MIT License
2.94k stars 279 forks source link

A strange, but clever way to train new voices in the piper (onnx) format? #72

Open cushycrux opened 3 months ago

cushycrux commented 3 months ago
  1. Download and unpack https://keithito.com/LJ-Speech-Dataset/ (a huge voice dataset incl. the script).
  2. Install RVC WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI).
  3. Download and install voicemodels in .pth format. (https://voice-models.com/) or create one with your own voice by reading the LJ script.
  4. Optionaly "Convert" the WAV Files from LJ-Speech to another voice with RVC or use your voice's WAV Files.
  5. Download and install piper and train your new voice in onnx format. (https://github.com/rhasspy/piper)
  6. Profit?

ps. piper training is Linux only but works in Windows 10/11 WSL (https://learn.microsoft.com/en-us/windows/wsl/install) Poweshell: wsl --install

Thoughts?

dnhkng commented 3 months ago

Could be interesting. I've tried several fast voice cloning models (a few minutes audio), and none were very good.

Also, Piper is just a wrapper on VITS, and I'm not sure I like that level of abstraction. I was thinking more about a more minimal wrapper on VITS, as I have around whisper and llama.

MithrilMan commented 2 months ago

@cushycrux have you tried that approach?