import torch
import torchaudio
import numpy as np
import soundfile as sf
torch._dynamo.config.cache_size_limit = 64
torch._dynamo.config.suppress_errors = True
torch.set_float32_matmul_precision('high')
import ChatTTS
from IPython.display import Audio
# Initialize and load the model:
chat = ChatTTS.Chat()
chat.load(compile=False) # Set to True for better performance
# Define the text input for inference (Support Batching)
texts = [
"So we found being competitive and collaborative was a huge way of staying motivated towards our goals, so one person to call when you fall off, one person who gets you back on then one person to actually do the activity with.",
"海信小聚啊海信小聚啊海信小聚", "共青团爸爸海信小聚啊哈哈"]
# Perform inference and play the generated audio
wavs = chat.infer(texts)
# Save the generated audio
sf.write('/data_hdd/test_syth.wav', np.squeeze(wavs[0]), 24000, 'PCM_16')
sf.write('/data_hdd/test_syth_2.wav', np.squeeze(wavs[1]), 24000, 'PCM_16')
sf.write('/data_hdd/test_syth_3.wav', np.squeeze(wavs[2]), 24000, 'PCM_16')
1、text为12个中文汉字,推理后生成的音频中间随机的地方会出现“什么”,“就”之类的说话不通顺的过渡词 2、句尾截断,最后会丢一个字,或者是丢最后一个字的大半个音(只读前小半的音),text同样为12个中文汉字 3、compile设为True时,推理过慢,3秒钟的音频需要花5分钟以上的时间
可以请作者看看这些问题吗,用的显卡是A100
代码: