Closed 0xYc0d0ne closed 7 months ago
Hi @0xYc0d0ne
Not currently no. Its something Im considering, however, there will be a chunk of code to re-write to make it integrate with other models. There is no way to drop in place another model currently.
Thanks
@erew123 I have an experimental fork which is designed to allow use of the English VCTK/VITS
model via the API Local option in the AllTalk settings interface. It runs considerably faster on lower end hardware when using CPU inference and has the benefit of multiple voices running off the single model if you need variety of English accents: https://github.com/erew123/alltalk_tts/compare/main...UXVirtual:alltalk_tts:feature/vctk-vits-support
Out of the box AllTalk only supports single speaker models, but my fork allows the use of models with multiple speakers like VCTK/VITS
.
I use this when testing and demonstrating portable offline TTS from my M1 MacBook which doesn't have GPU inference via DeepSpeed for XTTSv2
. While the results aren't as good as XTTSv2, it is more stable and avoids various hallucinations in longer text.
While the VCTK/VITS
model doesn't explicitly allow quick voice cloning, it does demonstrate using an alternate model that is compatible with the underlying TTS python library. TTS
will automatically download and install the model you define in the tts_model_name
property of AllTalk's config instead of XTTSv2
. You can try other single voice models to see if any are suitable -
To make AllTalk use the VCTK/VITS
model, you need to edit confignew.json
in the AllTalk folder. Change the following property values:
tts_model_name
to tts_models/en/vctk/vits
tts_method_api_local
to false
tts_method_api_tts
to true
tts_method_xtts_local
to false
If you're on macOS you can install the espeak
dependency that VCTK/VITS
requires using the following brew formulae:
brew install espeak
You'll need to see what the equivalent is for Windows or Linux if you are using those OS to run AllTalk.
When making the request via AllTalk's REST API you need to add a character_speaker
request attribute and set it to the voice you want (e.g. p226
). See here for the full list.
@UXVirtual That's interesting! Ill need to have a play at some point and continue my thoughts on how this might be integrated. Ive had a few week long debate in my head about how to maybe separate the model loaders out from the rest of AllTalk, allowing the potential to load/use theoretically any model. What you've done though is a nice little addition that isn't too heavy on a re-code.
I'm going to make a note of this in the Feature requests on the discussion forum... and let me head roll over it a bit more.
Give me a bit of time and Ill get back to you at some point! (if thats ok!)
Thanks
Hey @erew123 no problem! The separation of model loaders sounds like a good approach - I look forward to seeing what integrations can be done there :-)
i was wondering if its possible to use another model like StyleTTS with alltalk instead of the default coqui xtts model since there are probably better models out there for voice cloning...