Is it possible to use Cuda with this.

Toolfolks commented 1 month ago

testing coqui_test.py and its really slow and stuttering.

KoljaB commented 1 month ago

Please check https://github.com/KoljaB/RealtimeTTS#cuda-installation

KoljaB commented 1 month ago

pip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

Or cu118 for CUDA 11.8

Toolfolks commented 1 month ago

This link worked. https://github.com/KoljaB/RealtimeTTS#cuda-installation Got it working. The engine takes about 75 to start. Code change. import torch from RealtimeTTS import TextToAudioStream, CoquiEngine

def check_cuda(): if torch.cuda.is_available(): print("CUDA is available") print(f"Current device: {torch.cuda.current_device()}") print(f"Device count: {torch.cuda.device_count()}") print(f"Device name: {torch.cuda.get_device_name(0)}") else: print("CUDA is not available")

def dummy_generator(text): yield text

if name == 'main': check_cuda()

# Initialize the engine with CUDA support
engine = CoquiEngine(device="cuda")
stream = TextToAudioStream(engine)

try:
    while True:
        # Get user input
        user_input = input("Enter text to synthesize (or 'exit' to quit): ")
        if user_input.lower() == 'exit':
            break

        # Synthesize and play the input text
        stream.feed(dummy_generator(user_input)).play(log_synthesized_text=True)
except KeyboardInterrupt:
    print("Exiting...")

engine.shutdown()

a few questions.

Although the voice is not stuttering I don't see any GPU usage % in the Nvidia panel. Is this the correct syntax engine = CoquiEngine(device="cuda")
Where do I find voices, store them and use them. Is there a certain voice format ?
There is mentions of training voices but I don't see any instructions.

KoljaB commented 1 month ago

Leave out the device="cuda" parameter. CoquiEngine does not know this, it detects automatically if to use cuda.

To XTTS finetuning:

Recommended Tools:
- Use xtts-webui over alltalk_tts. Danil's WebUI is more flexible and produces better models.
Curate Your Dataset:
- Check every sentence carefully. Trim audio clips to remove any noise, breaths, or silence at the end.
- Ensure no artificial noises are present; even one can degrade model performance.
- Avoid sentences longer than XTTS can process (max 250 characters for English, I recommed to cut off at 240 chars).
Sample Length:
- Use shorter sample lengths (around 11 seconds) for better performance. Longer samples can complicate training.
Audio Preprocessing:
- Convert audio to WAV signed 16-bit PCM, MONO, 22050 or 44100 Hz (whisper will downsample anyways). Use mono to ensure consistency.
Avoid Overtraining:
- Optimal training is between 6-12 epochs. Training beyond 20 epochs can degrade model quality.
Data Quality:
- More data generally yields better results. Aim for large, high-quality dataset to improve model performance.

KoljaB commented 1 month ago

Look here pls for infos to voices:

https://github.com/KoljaB/RealtimeTTS/blob/master/FAQ.md#how-to-use-voices

KoljaB commented 1 month ago

And especially here for voice cloning with CoquiEngine:

https://github.com/KoljaB/RealtimeTTS/blob/master/FAQ.md#use-voice-cloning

KoljaB commented 1 month ago

Also this one is best currently for training own voices:

https://github.com/daswer123/xtts-finetune-webui

Toolfolks commented 1 month ago

Thanks for the info.

I am struggling trying to save the audio file. ChatGPT, Gemini & Claude. Been going round in circles for hours.

Any help appreciated .

KoljaB commented 1 month ago

Maybe this helps:

https://github.com/KoljaB/RealtimeTTS/blob/master/tests/write_to_file.py

Toolfolks commented 1 month ago

Great. I didn't notice that......

Toolfolks commented 1 month ago

I have used xtts-finetune-webui and created a reasonable sounding copy of the voice.

I see in the run folder best_model.pth best_model_174.pth config.json

How do I use this voice in RealtimeTTS please.

KoljaB commented 1 month ago

You use a trained model with the following code:

engine = CoquiEngine(
    specific_model="Lasinya",
    local_models_path="D:/models"]
)
engine.set_cloning_reference("D:/reference_files/my_voice_reference.wav")

For this example to work there should be a folder "D:/models/Lasinya" with the files "config.json", "model.pth" and "vocab.json" in it. I'd also copy "speakers_xtts.pth" to this folder.

These files should be in the xtts-finetune-webui folder under "finetune_models\ready" if you completed training. Don't forget to optimize the model after training, this is another button in the webui interface.

Your files don't look finished, like from an intermediate training step? They might work if you rename one of the best_model.pth to model.pth and just put it into a folder together with the config.json. But completing the full training the filename should be "model.pth" and not best_model something.

Toolfolks commented 1 month ago

Great got that working. While using xtts-finetune-webui ( I have posted on their issue page as well ) I have created an env (createvoice) The code shows the GPU (Nvidia) import torch

def test_cuda(): if torch.cuda.is_available(): print("CUDA is available") print(f"Current device: {torch.cuda.current_device()}") print(f"Device count: {torch.cuda.device_count()}") print(f"Device name: {torch.cuda.get_device_name(0)}") else: print("CUDA is not available")

if name == "main": test_cuda()

The GPU% = 0 CPU 17%

How do I use GPU to speed the process up please it to over 20 mins to do the test voice and GPU remained 0%.

What am I missing please.

KoljaB commented 1 month ago

Please check point 4:

https://github.com/KoljaB/RealtimeTTS#cuda-installation

Toolfolks commented 1 month ago

This is the createvoice environment for xttx-finetune-webui

C:\WINDOWS\system32>d:

D:>cd D:\techy\TTS\xtts-finetune-webui

D:\techy\TTS\xtts-finetune-webui>conda activate createvoice

(createvoice) D:\techy\TTS\xtts-finetune-webui>python -V Python 3.11.9

+---------------- | Processes: | GPU GI CI | ID ID |================ | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A | 0 N/A N/A +---------------- -------------------------------------------------------------------------+ | PID Type Process name GPU Memory | Usage | =========================================================================| 6704 C+G ...siveControlPanel\SystemSettings.exe N/A | 8532 C+G ...up\ui-launcher\AdskAccessUIHost.exe N/A | 10280 C+G ...tionsPlus\logioptionsplus_agent.exe N/A | 12752 C+G C:\Windows\explorer.exe N/A | 13600 C+G ...n\NVIDIA app\CEF\NVIDIA Overlay.exe N/A | 14324 C+G ...mpt_builder\LogiAiPromptBuilder.exe N/A | 15340 C+G ...crosoft\Edge\Application\msedge.exe N/A | 15520 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | 17924 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | 18192 C ....conda\envs\createvoice\python.exe N/A | 18784 C+G ...0.0_x64cv1g1gvanyjgm\WhatsApp.exe N/A | 18896 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | 19348 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | 19964 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A | 20052 C+G ...al\Discord\app-1.0.9155\Discord.exe N/A | 20648 C+G ...oogle\Chrome\Application\chrome.exe N/A | 21320 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | 21680 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | 22420 C+G ...ejd91yc\AdobeNotificationClient.exe N/A | 22740 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | 23632 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | 23668 C+G ...Programs\Microsoft VS Code\Code.exe N/A | 24272 C+G ...ns\Software\Current\LogiOverlay.exe N/A | 24500 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | 25004 C+G ...cal\Microsoft\OneDrive\OneDrive.exe N/A | 28008 C+G ...nzyj5cx40ttqa\iCloud\iCloudHome.exe N/A | 28764 C+G ....41_x64__8wekyb3d8bbwe\ms-teams.exe N/A | 29124 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | 29296 C+G ...e Stream\94.0.1.0\GoogleDriveFS.exe N/A | 33052 C+G ...on\HEX\Creative Cloud UI Helper.exe N/A | 34244 C+G ...usion\LiveUpdate\Reallusion Hub.exe N/A | 38588 C+G ...1.0_x648wekyb3d8bbwe\Video.UI.exe N/A | 39780 C+G ...at DC\Acrobat\acrocef_1\AcroCEF.exe N/A | 42048 C ....conda\envs\realtimetts\python.exe N/A | 44996 C+G ...b3d8bbwe\Microsoft.Media.Player.exe N/A | -------------------------------------------------------------------------+

(createvoice) D:\techy\TTS\xtts-finetune-webui>pip show torch Name: torch Version: 2.3.1+cu121 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: C:\Users\User.conda\envs\createvoice\Lib\site-packages Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions Required-by: coqui-tts, coqui-tts-trainer, encodec, torchaudio

(createvoice) D:\techy\TTS\xtts-finetune-webui>

the realtimetts env is

D:\techy\TTS\RealtimeTTS>conda activate realtimetts

(realtimetts) D:\techy\TTS\RealtimeTTS>python -V Python 3.9.19

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 6704 C+G ...siveControlPanel\SystemSettings.exe N/A | | 0 N/A N/A 8532 C+G ...up\ui-launcher\AdskAccessUIHost.exe N/A | | 0 N/A N/A 10280 C+G ...tionsPlus\logioptionsplus_agent.exe N/A | | 0 N/A N/A 12752 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 13600 C+G ...n\NVIDIA app\CEF\NVIDIA Overlay.exe N/A | | 0 N/A N/A 14324 C+G ...mpt_builder\LogiAiPromptBuilder.exe N/A | | 0 N/A N/A 15340 C+G ...crosoft\Edge\Application\msedge.exe N/A | | 0 N/A N/A 15520 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | | 0 N/A N/A 17924 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 18192 C ....conda\envs\createvoice\python.exe N/A | | 0 N/A N/A 18784 C+G ...0.0_x64cv1g1gvanyjgm\WhatsApp.exe N/A | | 0 N/A N/A 18896 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 19348 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 19964 C+G ...ekyb3d8bbwe\PhoneExperienceHost.exe N/A | | 0 N/A N/A 20052 C+G ...al\Discord\app-1.0.9155\Discord.exe N/A | | 0 N/A N/A 20648 C+G ...oogle\Chrome\Application\chrome.exe N/A | | 0 N/A N/A 21320 C+G ...n\126.0.2592.113\msedgewebview2.exe N/A | | 0 N/A N/A 21680 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 22420 C+G ...ejd91yc\AdobeNotificationClient.exe N/A | | 0 N/A N/A 22740 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | | 0 N/A N/A 23632 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 23668 C+G ...Programs\Microsoft VS Code\Code.exe N/A | | 0 N/A N/A 24272 C+G ...ns\Software\Current\LogiOverlay.exe N/A | | 0 N/A N/A 24500 C+G ..._x64kzf8qxf38zg5c\Skype\Skype.exe N/A | | 0 N/A N/A 25004 C+G ...cal\Microsoft\OneDrive\OneDrive.exe N/A | | 0 N/A N/A 28008 C+G ...nzyj5cx40ttqa\iCloud\iCloudHome.exe N/A | | 0 N/A N/A 28764 C+G ....41_x64__8wekyb3d8bbwe\ms-teams.exe N/A | | 0 N/A N/A 29124 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 29296 C+G ...e Stream\94.0.1.0\GoogleDriveFS.exe N/A | | 0 N/A N/A 33052 C+G ...on\HEX\Creative Cloud UI Helper.exe N/A | | 0 N/A N/A 34244 C+G ...usion\LiveUpdate\Reallusion Hub.exe N/A | | 0 N/A N/A 38588 C+G ...1.0_x648wekyb3d8bbwe\Video.UI.exe N/A | | 0 N/A N/A 39780 C+G ...at DC\Acrobat\acrocef_1\AcroCEF.exe N/A | | 0 N/A N/A 42048 C ....conda\envs\realtimetts\python.exe N/A | | 0 N/A N/A 44996 C+G ...b3d8bbwe\Microsoft.Media.Player.exe N/A | +-----------------------------------------------------------------------------------------+

(realtimetts) D:\techy\TTS\RealtimeTTS>pip show torch Name: torch Version: 2.3.1+cu118 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: c:\users\user.conda\envs\realtimetts\lib\site-packages Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions Required-by: coqui-tts, coqui-tts-trainer, encodec, RealtimeSTT, stanza, torchaudio

(realtimetts) D:\techy\TTS\RealtimeTTS>

What am I missing here please.

KoljaB commented 1 month ago

Webui is against CUDA 12 (+cu121), RealtimeSTT against CUDA 11 (+cu118). There is only one CUDA installed I guess. Check what Cuda you have then install correct torch version for the project with correct CUDA version.

KoljaB / RealtimeTTS

Is it possible to use Cuda with this. #117