Closed MotherSoraka closed 5 months ago
Hi @MotherSoraka a 2GB GPU isnt enough to squeeze the model and overhead for processing in without flowing over into System Ram (on Windows) or possibly saying "no Im not doing it" and crashing (Linux).
If you wanted to test solely on CPU generation, without having to remove your GPU, you can edit the XTTS model engine script e.g https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/tts_engines/xtts/model_engine.py
You would disable the LowVRAM before doing this or you can expect some strange happenings.
So you would change:
self.device = "cuda" if torch.cuda.is_available() else "cpu"
to
self.device = "cpu" if torch.cuda.is_available() else "cpu"
That will force it to stay on CPU no matter what, though I cant say if it will or wont work. But you cannot use the LowVRAM setting (possibly DeepSpeed too) with that.
RE "Is it possible to not hide the real-time text streaming, Let the Text to stream normally and only attach the voice file when its done?"
Im not sure if I am interpreting your question correctly on this one, so you may have to re-phrase it. As far as TTS generation, the XTTS AI model needs a starting wav audio sample for it to clone/copy a voice. As far as output goes, Coqui's scripts demand to be interacted with in the way Ive interacted with them. So for example, on streaming generation, although it requires a wav file, it doesnt actually generate a wav file that is saved to disk, its a wav stream.
Not sure if that does or doesnt answer that part of your question.
Thanks
I don't think LowVRAM is quite cutting it for me. i have a 12700K and a spare 1050 2GB GPU. Is it possible to run the models entirely (XTTS 2.0.3) on either my 1050 or my CPU?
And while you're here, Is it possible to not hide the real-time text streaming, Let the Text to stream normally and only attach the voice file when its done?
Insane project btw, so much work, so much Wow.