2noise / ChatTTS

A generative speech model for daily dialogue.
https://2noise.com
Other
26.77k stars 2.91k forks source link

KeyError: 'sha256_config_decoder_yaml' #438

Closed kathuluman closed 1 week ago

kathuluman commented 1 week ago
(voice_synth) PS D:\AI\Alex\ChatTTS> .\voice_synth\Scripts\python.exe .\speak_boy.py
Traceback (most recent call last):
  File "D:\AI\Alex\ChatTTS\speak_boy.py", line 10, in <module>
    chat.load(compile=False) # Set to True for better performance
  File "D:\AI\Alex\ChatTTS\ChatTTS\core.py", line 101, in load
    download_path = self.download_models(source, force_redownload, custom_path)
  File "D:\AI\Alex\ChatTTS\ChatTTS\core.py", line 63, in download_models
    if not check_all_assets(update=True) or force_redownload:
  File "D:\AI\Alex\ChatTTS\ChatTTS\utils\dl.py", line 76, in check_all_assets
    current_dir, model, os.environ[f"sha256_config_{menv}"], update
  File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'sha256_config_decoder_yaml'

I had cloned the repo into my folder created a virtualenv for this project. I had then used the following script in attempt to get this to work.

import ChatTTS
from IPython.display import Audio
import torchaudio
import torch

###################################
# Sample a speaker from Gaussian.

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["Hello my name is Alex your Artificial Intelligent assistant here to help you with your everyday needs and wants.  I am here to try to assist you throughout your journey as you work on developing me to be the best assistant I can be.",]
rand_spk = chat.sample_random_speaker()

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb = rand_spk, # add sampled speaker 
    temperature = .3,   # using custom temperature
    top_P = 0.7,        # top P decode
    top_K = 20,         # top K decode
)

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7) 
# to generate special token in text to synthesize.
params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

wavs = chat.infer(
    texts,
    params_refine_text=params_refine_text,
    params_infer_code=params_infer_code,
)

###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text,  params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)

When ran I get that error. When I run the run.py file it works fine, I can run the webui.py and have it all work via the webui but I can not get it to work like this please help. I am also using Python 3.10.5

TianduoWang commented 1 week ago

Add the following code can help:

from dotenv import load_dotenv
load_dotenv('sha256.env')
fumiama commented 1 week ago

Will add the load_dotenv to README later.

kathuluman commented 5 days ago

How can I use a specific speaker? how can I specify that in this python program

import ChatTTS
from IPython.display import Audio
import torchaudio
import torch
from dotenv import load_dotenv
load_dotenv('sha256.env')

###################################
# Sample a speaker from Gaussian.

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["Hello my name is Alex your Artificial Intelligent assistant here to help you with your everyday needs and wants.  I am here to try to assist you throughout your journey as you work on developing me to be the best assistant I can be.",]
rand_spk = chat.sample_random_speaker()

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb = rand_spk, # add sampled speaker 
    temperature = .3,   # using custom temperature
    top_P = 0.7,        # top P decode
    top_K = 20,         # top K decode
)

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7) 
# to generate special token in text to synthesize.
params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

wavs = chat.infer(
    texts,
    params_refine_text=params_refine_text,
    params_infer_code=params_infer_code,
)

###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text,  params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)

If that could be added to the usage as well that would mean a lot because I would like to use a specific speaker.

kathuluman commented 5 days ago

Example 1

(voice_synth) PS D:\AI\Alex\ChatTTS> python .\speak_boy.py
no GPU found, use CPU instead
found invalid characters: {'1'}
found invalid characters: {'2'}
text:   3%|█▉                                                                          | 10/384(max) [00:01,  8.84it/s]
code:   3%|██▍                                                                        | 67/2048(max) [00:05, 13.12it/s]
Traceback (most recent call last):
  File "D:\AI\Alex\ChatTTS\speak_boy.py", line 14, in <module>
    torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)
  File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 312, in save
    backend = dispatcher(uri, format, backend)
  File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 222, in dispatcher
    raise RuntimeError(f"Couldn't find appropriate backend to handle uri {uri} and format {format}.")
RuntimeError: Couldn't find appropriate backend to handle uri output1.wav and format None.

Code I had tried to use.

import ChatTTS
import torch
import torchaudio
from dotenv import load_dotenv
load_dotenv('sha256.env')

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance

texts = ["PUT YOUR 1st TEXT HERE", "PUT YOUR 2nd TEXT HERE"]

wavs = chat.infer(texts)

torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)

Example 2

I had tried my code as I had posted in the beginning of this issue thread with the modification that was suggested and I had received the same error, it will attempt to work and do its thing bit after a bit it repeats the same error.

import ChatTTS
from IPython.display import Audio
import torchaudio
import torch
from dotenv import load_dotenv
load_dotenv('sha256.env')

###################################
# Sample a speaker from Gaussian.

chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
texts = ["Hello my name is Alex your Artificial Intelligent assistant here to help you with your everyday needs and wants.  I am here to try to assist you throughout your journey as you work on developing me to be the best assistant I can be.",]
rand_spk = chat.sample_random_speaker()

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb = rand_spk, # add sampled speaker 
    temperature = .3,   # using custom temperature
    top_P = 0.7,        # top P decode
    top_K = 20,         # top K decode
)

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7) 
# to generate special token in text to synthesize.
params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

wavs = chat.infer(
    texts,
    params_refine_text=params_refine_text,
    params_infer_code=params_infer_code,
)

###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wavs = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text,  params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)

Error

(voice_synth) PS D:\AI\Alex\ChatTTS> python .\speak_boy.py
no GPU found, use CPU instead
text:  20%|███████████████▍                                                            | 78/384(max) [00:10,  7.43it/s]
code:  34%|█████████████████████████▎                                                | 699/2048(max) [01:00, 11.48it/s]
found invalid characters: {'?'}
code:   8%|██████▏                                                                   | 172/2048(max) [00:12, 14.22it/s]
Traceback (most recent call last):
  File "D:\AI\Alex\ChatTTS\speak_boy.py", line 42, in <module>
    torchaudio.save("output2.wav", torch.from_numpy(wavs[0]), 24000)
  File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 312, in save
    backend = dispatcher(uri, format, backend)
  File "D:\AI\Alex\ChatTTS\voice_synth\lib\site-packages\torchaudio\_backend\utils.py", line 222, in dispatcher
    raise RuntimeError(f"Couldn't find appropriate backend to handle uri {uri} and format {format}.")
RuntimeError: Couldn't find appropriate backend to handle uri output2.wav and format None.