erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
686 stars 71 forks source link

CPU bound/not making use of GPU? #201

Closed Urammar closed 2 months ago

Urammar commented 2 months ago

Checking task manager as it goes with a 3090 and 16 core cpu, this seems to be using a lot of CPU and not a lot of GPU when generating. I'm running on windows with deepspeed. Is there potentially some flag I am missing?

It seems theres remarkable speedups im missing out on, here, while my gpu is idling.

RenNagasaki commented 2 months ago

What is shown in the loading console? Loading model into CUDA od CPU?

If its CPU then AllTalk doesn't find your GPU, if CUDA its working as intended. You may wanna check in the confignew.json if deepspeed_activate is set to true.

erew123 commented 2 months ago

@Urammar As @RenNagasaki says, it should state at the terminal if its loading a model into CPU or CUDA. If its loading into CPU then that would be your issue. Which would mean that you potentially don't have PyTorch with CUDA installed.

Without a diagnostics file, I cant see your Python Environment or how you have it setup. I would suggest running the diagnostics and see if you have Pytorch with CUDA

Generally this issue is covered on the help section in here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#startup-performance-and-compatibility-issues

image

IF AND ONLY IF you are running on the standalone version of AllTalk, I would suggest performing a git pull and starting the AllTalk Python environment with start_environment then you can

pip cache purge

pip install torch>=2.2.1+cu121 torchaudio>=2.2.1+cu121 --upgrade --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu121

If you are in text-gen-webui.... in theory you can start its environment with cmd_{yourOS} and run a similar command...BUT... you need to be sure of the CUDA version you set text-gen-webui up with originally and match the cuda version.

Thanks

Urammar commented 2 months ago

image

image

You can see two test generations here, followed by spikes in cpu and, suspiciously, wifi activity. I think I brought up networking being sent out a while back with training, and some disclaimers were added. Any insight into that now?

erew123 commented 2 months ago

Hi @Urammar

Re network activity during finetuning, please read the Finetuning section on the front page https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-a-note-about-anonymous-training-telemetry-information--disabling-it

As far as your CPU activity, that I would say that's just standard activity going along with processing Python scripts. You can see the 2 matching spikes on your GPU (2nd from the bottom), so I would say you are processing on CUDA without any problem.

Does that cover all you need?

Thanks

Urammar commented 2 months ago

Okay, so, the problem is identified. I recently upgraded to a dual GPU setup, and its using the far smaller card. This is why have slowdown now. Is there a way to force the extension to only use a single gpu?

erew123 commented 2 months ago

@Urammar You can force Python to use a specific GPU, but its a system wide setting:

https://github.com/erew123/alltalk_tts?tab=readme-ov-file#startup-performance-and-compatibility-issues

See I have multiple GPU's and I have problems running Finetuning for details of the setting.

I don't know of a way to specifically bind just AllTalk (when running as part of Text-gen-webui) to a specific card as the Coqui scripts don't have that feature/ability.

So setting the above setting would also force text-gen-webui to only the 1x card too (I believe) if both things are being run at the same time.

In the next version of AllTalk, I will have built a remote extension for text-gen-webui. Which will allow you to run AllTalk as a completely separate instance, with separate environment variables for AllTalk and Text-gen-webui, so that would allow you to lock one terminal/command prompt/Python environment to one GPU and the other can use both (or specify the other GPU).

However, this version of AllTalk is a while off yet, as its still in development.

image