Open pieris98 opened 1 week ago
The XTTS model natively supports voice cloning, so just use the following (and pick just one of speaker
and speaker_wav
, depending on which of them you need):
from TTS.api import TTS
device = "cpu"
print(device)
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
tts.tts_to_file(text="Hello world!", speaker='Andrew Chipper',speaker_wav="/path/to/voice_sample.wav", language="en",file_path="/path/to/outputs/xttsv2_en_output.wav")
This should run correctly on the CPU. The with_vc
would pass the already cloned output through an additional voice conversion model (FreeVC), but that's not necessary here and probably leads to worse results.
Hey Enno, thanks a lot for the pointer, I didn't realise that some models have voice cloning built in rather than with tts.tts_with_vc_to_file()
.
I was then trying to run the model in tts-server
and noticed this issue #3369 so I just wanted to point it out as it seems more important to solve in the codebase.
Describe the bug
Similar to #3787, but also when running
xtts_v2
model with voice cloning (vocoder model), usingdevice='cpu'
results to the following error:To Reproduce
import torch from TTS.api import TTS
device = "cpu" print(device)
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
tts.tts_with_vc_to_file(text="Hello world!", speaker='Andrew Chipper',speaker_wav="/path/to/voice_sample.wav", language="en",file_path="/path/to/outputs/xttsv2_en_output.wav")
Expected behavior
The inference should run without using CUDA or reporting any CUDA/CUDNN/GPU-related errors.
Logs
Environment
Additional context
Note: Even though I do have CUDA and an NVIDIA GPU on my laptop, I want to use CPU because the VRAM of my GPU is not enough for the model I wanted to use.