Closed kathuluman closed 1 week ago
Add the following code can help:
from dotenv import load_dotenv
load_dotenv('sha256.env')
Will add the load_dotenv
to README later.
How can I use a specific speaker? how can I specify that in this python program
import ChatTTS
from IPython.display import Audio
import torchaudio
import torch
from dotenv import load_dotenv
load_dotenv('sha256.env')
###################################
# Sample a speaker from Gaussian.
chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["Hello my name is Alex your Artificial Intelligent assistant here to help you with your everyday needs and wants. I am here to try to assist you throughout your journey as you work on developing me to be the best assistant I can be.",]
rand_spk = chat.sample_random_speaker()
params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb = rand_spk, # add sampled speaker
temperature = .3, # using custom temperature
top_P = 0.7, # top P decode
top_K = 20, # top K decode
)
###################################
# For sentence level manual control.
# use oral_(0-9), laugh_(0-2), break_(0-7)
# to generate special token in text to synthesize.
params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_2][laugh_0][break_6]',
)
wavs = chat.infer(
texts,
params_refine_text=params_refine_text,
params_infer_code=params_infer_code,
)
###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text, params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)
If that could be added to the usage as well that would mean a lot because I would like to use a specific speaker.
(voice_synth) PS D:\AI\Alex\ChatTTS> python .\speak_boy.py
no GPU found, use CPU instead
found invalid characters: {'1'}
found invalid characters: {'2'}
text: 3%|█▉ | 10/384(max) [00:01, 8.84it/s]
code: 3%|██▍ | 67/2048(max) [00:05, 13.12it/s]
Traceback (most recent call last):
File "D:\AI\Alex\ChatTTS\speak_boy.py", line 14, in <module>
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 312, in save
backend = dispatcher(uri, format, backend)
File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 222, in dispatcher
raise RuntimeError(f"Couldn't find appropriate backend to handle uri {uri} and format {format}.")
RuntimeError: Couldn't find appropriate backend to handle uri output1.wav and format None.
Code I had tried to use.
import ChatTTS
import torch
import torchaudio
from dotenv import load_dotenv
load_dotenv('sha256.env')
chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]
wavs = chat.infer(texts)
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
I had tried my code as I had posted in the beginning of this issue thread with the modification that was suggested and I had received the same error, it will attempt to work and do its thing bit after a bit it repeats the same error.
import ChatTTS
from IPython.display import Audio
import torchaudio
import torch
from dotenv import load_dotenv
load_dotenv('sha256.env')
###################################
# Sample a speaker from Gaussian.
chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["Hello my name is Alex your Artificial Intelligent assistant here to help you with your everyday needs and wants. I am here to try to assist you throughout your journey as you work on developing me to be the best assistant I can be.",]
rand_spk = chat.sample_random_speaker()
params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb = rand_spk, # add sampled speaker
temperature = .3, # using custom temperature
top_P = 0.7, # top P decode
top_K = 20, # top K decode
)
###################################
# For sentence level manual control.
# use oral_(0-9), laugh_(0-2), break_(0-7)
# to generate special token in text to synthesize.
params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_2][laugh_0][break_6]',
)
wavs = chat.infer(
texts,
params_refine_text=params_refine_text,
params_infer_code=params_infer_code,
)
###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text, params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)
(voice_synth) PS D:\AI\Alex\ChatTTS> python .\speak_boy.py
no GPU found, use CPU instead
text: 20%|███████████████▍ | 78/384(max) [00:10, 7.43it/s]
code: 34%|█████████████████████████▎ | 699/2048(max) [01:00, 11.48it/s]
found invalid characters: {'?'}
code: 8%|██████▏ | 172/2048(max) [00:12, 14.22it/s]
Traceback (most recent call last):
File "D:\AI\Alex\ChatTTS\speak_boy.py", line 42, in <module>
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)
File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 312, in save
backend = dispatcher(uri, format, backend)
File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 222, in dispatcher
raise RuntimeError(f"Couldn't find appropriate backend to handle uri {uri} and format {format}.")
RuntimeError: Couldn't find appropriate backend to handle uri output2.wav and format None.
I had cloned the repo into my folder created a virtualenv for this project. I had then used the following script in attempt to get this to work.
When ran I get that error. When I run the run.py file it works fine, I can run the
webui.py
and have it all work via thewebui
but I can not get it to work like this please help. I am also usingPython 3.10.5