KoljaB / RealtimeTTS

Converts text to speech in realtime
1.69k stars 152 forks source link

Is it possible to use Cuda with this. #117

Open Toolfolks opened 1 month ago

Toolfolks commented 1 month ago

testing coqui_test.py and its really slow and stuttering.

KoljaB commented 1 month ago

Please check https://github.com/KoljaB/RealtimeTTS#cuda-installation

KoljaB commented 1 month ago

pip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

Or cu118 for CUDA 11.8

Toolfolks commented 1 month ago

This link worked. https://github.com/KoljaB/RealtimeTTS#cuda-installation Got it working. The engine takes about 75 to start. Code change. import torch from RealtimeTTS import TextToAudioStream, CoquiEngine

def check_cuda(): if torch.cuda.is_available(): print("CUDA is available") print(f"Current device: {torch.cuda.current_device()}") print(f"Device count: {torch.cuda.device_count()}") print(f"Device name: {torch.cuda.get_device_name(0)}") else: print("CUDA is not available")

def dummy_generator(text): yield text

if name == 'main': check_cuda()

# Initialize the engine with CUDA support
engine = CoquiEngine(device="cuda")
stream = TextToAudioStream(engine)

try:
    while True:
        # Get user input
        user_input = input("Enter text to synthesize (or 'exit' to quit): ")
        if user_input.lower() == 'exit':
            break

        # Synthesize and play the input text
        stream.feed(dummy_generator(user_input)).play(log_synthesized_text=True)
except KeyboardInterrupt:
    print("Exiting...")

engine.shutdown()

a few questions.

  1. Although the voice is not stuttering I don't see any GPU usage % in the Nvidia panel. Is this the correct syntax engine = CoquiEngine(device="cuda")
  2. Where do I find voices, store them and use them. Is there a certain voice format ?
  3. There is mentions of training voices but I don't see any instructions.
KoljaB commented 1 month ago

Leave out the device="cuda" parameter. CoquiEngine does not know this, it detects automatically if to use cuda.

To XTTS finetuning:

KoljaB commented 1 month ago

Look here pls for infos to voices:

https://github.com/KoljaB/RealtimeTTS/blob/master/FAQ.md#how-to-use-voices

KoljaB commented 1 month ago

And especially here for voice cloning with CoquiEngine:

https://github.com/KoljaB/RealtimeTTS/blob/master/FAQ.md#use-voice-cloning

KoljaB commented 1 month ago

Also this one is best currently for training own voices:

https://github.com/daswer123/xtts-finetune-webui

Toolfolks commented 1 month ago

Thanks for the info.

I am struggling trying to save the audio file. ChatGPT, Gemini & Claude. Been going round in circles for hours.

Any help appreciated .

KoljaB commented 1 month ago

Maybe this helps:

https://github.com/KoljaB/RealtimeTTS/blob/master/tests/write_to_file.py

Toolfolks commented 1 month ago

Great. I didn't notice that......

Toolfolks commented 1 month ago

I have used xtts-finetune-webui and created a reasonable sounding copy of the voice.

I see in the run folder best_model.pth best_model_174.pth config.json

How do I use this voice in RealtimeTTS please.

KoljaB commented 1 month ago

You use a trained model with the following code:

engine = CoquiEngine(
    specific_model="Lasinya",
    local_models_path="D:/models"]
)
engine.set_cloning_reference("D:/reference_files/my_voice_reference.wav")

For this example to work there should be a folder "D:/models/Lasinya" with the files "config.json", "model.pth" and "vocab.json" in it. I'd also copy "speakers_xtts.pth" to this folder.

These files should be in the xtts-finetune-webui folder under "finetune_models\ready" if you completed training. Don't forget to optimize the model after training, this is another button in the webui interface.

Your files don't look finished, like from an intermediate training step? They might work if you rename one of the best_model.pth to model.pth and just put it into a folder together with the config.json. But completing the full training the filename should be "model.pth" and not best_model something.

Toolfolks commented 1 month ago

Great got that working. While using xtts-finetune-webui ( I have posted on their issue page as well ) I have created an env (createvoice) The code shows the GPU (Nvidia) import torch

def test_cuda(): if torch.cuda.is_available(): print("CUDA is available") print(f"Current device: {torch.cuda.current_device()}") print(f"Device count: {torch.cuda.device_count()}") print(f"Device name: {torch.cuda.get_device_name(0)}") else: print("CUDA is not available")

if name == "main": test_cuda()

The GPU% = 0 CPU 17%

How do I use GPU to speed the process up please it to over 20 mins to do the test voice and GPU remained 0%.

What am I missing please.

KoljaB commented 1 month ago

Please check point 4:

https://github.com/KoljaB/RealtimeTTS#cuda-installation

Toolfolks commented 1 month ago

This is the createvoice environment for xttx-finetune-webui

Microsoft Windows [Version 10.0.19045.4651] (c) Microsoft Corporation. All rights reserved.

C:\WINDOWS\system32>d:

D:>cd D:\techy\TTS\xtts-finetune-webui

D:\techy\TTS\xtts-finetune-webui>conda activate createvoice

(createvoice) D:\techy\TTS\xtts-finetune-webui>python -V Python 3.11.9

(createvoice) D:\techy\TTS\xtts-finetune-webui>nvidia-smi Sat Jul 27 15:33:54 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 WDDM | 00000000:01:00.0 On | N/A | | 0% 45C P2 36W / 104W | 3691MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 6704 C+G ...siveControlPanel\SystemSettings.exe N/A | | 0 N/A N/A 8532 C+G ...up\ui-launcher\AdskAccessUIHost.exe N/A | | 0 N/A N/A 10280 C+G ...tionsPlus\logioptionsplus_agent.exe N/A | | 0 N/A N/A 12752 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 13600 C+G ...n\NVIDIA app\CEF\NVIDIA Overlay.exe N/A | | 0 N/A N/A 14324 C+G ...mpt_builder\LogiAiPromptBuilder.exe N/A | | 0 N/A N/A 15340 C+G ...crosoft\Edge\Application\msedge.exe N/A | | 0 N/A N/A 15520 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | | 0 N/A N/A 17924 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 18192 C ....conda\envs\createvoice\python.exe N/A | | 0 N/A N/A 18784 C+G ...0.0_x64cv1g1gvanyjgm\WhatsApp.exe N/A | | 0 N/A N/A 18896 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 19348 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 19964 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A | | 0 N/A N/A 20052 C+G ...al\Discord\app-1.0.9155\Discord.exe N/A | | 0 N/A N/A 20648 C+G ...oogle\Chrome\Application\chrome.exe N/A | | 0 N/A N/A 21320 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | | 0 N/A N/A 21680 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 22420 C+G ...ejd91yc\AdobeNotificationClient.exe N/A | | 0 N/A N/A 22740 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | | 0 N/A N/A 23632 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 23668 C+G ...Programs\Microsoft VS Code\Code.exe N/A | | 0 N/A N/A 24272 C+G ...ns\Software\Current\LogiOverlay.exe N/A | | 0 N/A N/A 24500 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | | 0 N/A N/A 25004 C+G ...cal\Microsoft\OneDrive\OneDrive.exe N/A | | 0 N/A N/A 28008 C+G ...nzyj5cx40ttqa\iCloud\iCloudHome.exe N/A | | 0 N/A N/A 28764 C+G ....41_x64__8wekyb3d8bbwe\ms-teams.exe N/A | | 0 N/A N/A 29124 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 29296 C+G ...e Stream\94.0.1.0\GoogleDriveFS.exe N/A | | 0 N/A N/A 33052 C+G ...on\HEX\Creative Cloud UI Helper.exe N/A | | 0 N/A N/A 34244 C+G ...usion\LiveUpdate\Reallusion Hub.exe N/A | | 0 N/A N/A 38588 C+G ...1.0_x648wekyb3d8bbwe\Video.UI.exe N/A | | 0 N/A N/A 39780 C+G ...at DC\Acrobat\acrocef_1\AcroCEF.exe N/A | | 0 N/A N/A 42048 C ....conda\envs\realtimetts\python.exe N/A | | 0 N/A N/A 44996 C+G ...b3d8bbwe\Microsoft.Media.Player.exe N/A | +-----------------------------------------------------------------------------------------+

(createvoice) D:\techy\TTS\xtts-finetune-webui>pip show torch Name: torch Version: 2.3.1+cu121 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: C:\Users\User.conda\envs\createvoice\Lib\site-packages Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions Required-by: coqui-tts, coqui-tts-trainer, encodec, torchaudio

(createvoice) D:\techy\TTS\xtts-finetune-webui>

the realtimetts env is

D:\techy\TTS\RealtimeTTS>conda activate realtimetts

(realtimetts) D:\techy\TTS\RealtimeTTS>python -V Python 3.9.19

(realtimetts) D:\techy\TTS\RealtimeTTS>nvidia-smi Sat Jul 27 15:38:59 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 555.85 Driver Version: 555.85 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 WDDM | 00000000:01:00.0 On | N/A | | 0% 44C P2 36W / 104W | 5648MiB / 12288MiB | 1% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 6704 C+G ...siveControlPanel\SystemSettings.exe N/A | | 0 N/A N/A 8532 C+G ...up\ui-launcher\AdskAccessUIHost.exe N/A | | 0 N/A N/A 10280 C+G ...tionsPlus\logioptionsplus_agent.exe N/A | | 0 N/A N/A 12752 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 13600 C+G ...n\NVIDIA app\CEF\NVIDIA Overlay.exe N/A | | 0 N/A N/A 14324 C+G ...mpt_builder\LogiAiPromptBuilder.exe N/A | | 0 N/A N/A 15340 C+G ...crosoft\Edge\Application\msedge.exe N/A | | 0 N/A N/A 15520 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | | 0 N/A N/A 17924 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 18192 C ....conda\envs\createvoice\python.exe N/A | | 0 N/A N/A 18784 C+G ...0.0_x64cv1g1gvanyjgm\WhatsApp.exe N/A | | 0 N/A N/A 18896 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 19348 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 19964 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A | | 0 N/A N/A 20052 C+G ...al\Discord\app-1.0.9155\Discord.exe N/A | | 0 N/A N/A 20648 C+G ...oogle\Chrome\Application\chrome.exe N/A | | 0 N/A N/A 21320 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | | 0 N/A N/A 21680 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 22420 C+G ...ejd91yc\AdobeNotificationClient.exe N/A | | 0 N/A N/A 22740 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | | 0 N/A N/A 23632 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 23668 C+G ...Programs\Microsoft VS Code\Code.exe N/A | | 0 N/A N/A 24272 C+G ...ns\Software\Current\LogiOverlay.exe N/A | | 0 N/A N/A 24500 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | | 0 N/A N/A 25004 C+G ...cal\Microsoft\OneDrive\OneDrive.exe N/A | | 0 N/A N/A 28008 C+G ...nzyj5cx40ttqa\iCloud\iCloudHome.exe N/A | | 0 N/A N/A 28764 C+G ....41_x64__8wekyb3d8bbwe\ms-teams.exe N/A | | 0 N/A N/A 29124 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 29296 C+G ...e Stream\94.0.1.0\GoogleDriveFS.exe N/A | | 0 N/A N/A 33052 C+G ...on\HEX\Creative Cloud UI Helper.exe N/A | | 0 N/A N/A 34244 C+G ...usion\LiveUpdate\Reallusion Hub.exe N/A | | 0 N/A N/A 38588 C+G ...1.0_x648wekyb3d8bbwe\Video.UI.exe N/A | | 0 N/A N/A 39780 C+G ...at DC\Acrobat\acrocef_1\AcroCEF.exe N/A | | 0 N/A N/A 42048 C ....conda\envs\realtimetts\python.exe N/A | | 0 N/A N/A 44996 C+G ...b3d8bbwe\Microsoft.Media.Player.exe N/A | +-----------------------------------------------------------------------------------------+

(realtimetts) D:\techy\TTS\RealtimeTTS>pip show torch Name: torch Version: 2.3.1+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: c:\users\user.conda\envs\realtimetts\lib\site-packages Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions Required-by: coqui-tts, coqui-tts-trainer, encodec, RealtimeSTT, stanza, torchaudio

(realtimetts) D:\techy\TTS\RealtimeTTS>

What am I missing here please.

KoljaB commented 1 month ago

Webui is against CUDA 12 (+cu121), RealtimeSTT against CUDA 11 (+cu118). There is only one CUDA installed I guess. Check what Cuda you have then install correct torch version for the project with correct CUDA version.