Open LivinLuo1993 opened 3 weeks ago
Now in dev
branch,
spk_stat.pt
has been removed and embedded into package config
.tokenizer.pt
has been removed, replaced by the json
files.You can refer to the HuggingFace repo for more details.
@LivinLuo1993 你在infer时都没有对声模进行传参吧..
import ChatTTS
from ChatTTS.model import Tokenizer
'''省略若干代码'''
spk = torch.load('声模.pt',
map_location='cpu',
weights_only=True,
)
spk = Tokenizer._encode_spk_emb(spk)# 字符串化
params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_emb=spk,
)
wavs = chat.infer(texts, params_infer_code=params_infer_code)
I am having the same issue when I try to capture an audio sample and use it as a fixed speaker. The speaker will stay the same for the first 3 tries but then suddenly change from a girl to a man's voice
from tools.audio import load_audio
spk_smp = chat.sample_audio_speaker(torch.tensor(load_audio("sample.mp3", 24000)).to('cuda'))
params_infer_code = ChatTTS.Chat.InferCodeParams(
spk_smp=spk_smp,
txt_smp="hands off the keyboard",
prompt="[speed_5]",
temperature=0.01,
top_P = 0.01, # top P decode
top_K = 1,
)
text = "You really think thats real [laugh]"
refined_text = chat.infer(text, refine_text_only=True)
print(refined_text)
params_refine_text = ChatTTS.Chat.RefineTextParams(
prompt='[oral_1][laugh_1][break_4]',
)
wav = chat.infer(
refined_text,
# text,
params_refine_text=params_refine_text,
skip_refine_text=True,
params_infer_code=params_infer_code,
)
Audio(wav[0], rate=24_000, autoplay=True)
when use new version, something occurred to me:
cannot use
and meet the question
assert self.has_loaded(use_decoder=use_decoder)
when load model with the flowing codes:
output audio is the same with
rand_spk = chat.sample_random_speaker()