Possible to run the models entirely on CPU+RAM or the 2nd GPU?

Hi @MotherSoraka a 2GB GPU isnt enough to squeeze the model and overhead for processing in without flowing over into System Ram (on Windows) or possibly saying "no Im not doing it" and crashing (Linux).

If you wanted to test solely on CPU generation, without having to remove your GPU, you can edit the XTTS model engine script e.g https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/tts_engines/xtts/model_engine.py

You would disable the LowVRAM before doing this or you can expect some strange happenings.

So you would change:

self.device = "cuda" if torch.cuda.is_available() else "cpu"

self.device = "cpu" if torch.cuda.is_available() else "cpu"

That will force it to stay on CPU no matter what, though I cant say if it will or wont work. But you cannot use the LowVRAM setting (possibly DeepSpeed too) with that.

RE "Is it possible to not hide the real-time text streaming, Let the Text to stream normally and only attach the voice file when its done?"

Im not sure if I am interpreting your question correctly on this one, so you may have to re-phrase it. As far as TTS generation, the XTTS AI model needs a starting wav audio sample for it to clone/copy a voice. As far as output goes, Coqui's scripts demand to be interacted with in the way Ive interacted with them. So for example, on streaming generation, although it requires a wav file, it doesnt actually generate a wav file that is saved to disk, its a wav stream.

Not sure if that does or doesnt answer that part of your question.

Thanks

erew123 / alltalk_tts

Possible to run the models entirely on CPU+RAM or the 2nd GPU? #252