PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.91k stars 1.83k forks source link

中英文混合的TTS有问题 #3493

Open lilongwei5054 opened 1 year ago

lilongwei5054 commented 1 year ago

中英文混合的TTS要么英文乱说的,要么直接跳过,有的模型直接生成TTS失败。

linuxonly801 commented 1 year ago

可以参考:https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/text_to_speech/README.md

Chinese English Mixed, multi-speaker You can change spk_id here.

# The `am` must be `fastspeech2_mix`!
# The `lang` must be `mix`!
# The voc must be chinese datasets' voc now!
# spk 174 is csmcc, spk 175 is ljspeech
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --spk_id 174 --output mix_spk174.wav
paddlespeech tts --am fastspeech2_mix --voc hifigan_aishell3 --lang mix --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --spk_id 174 --output mix_spk174_aishell3.wav
paddlespeech tts --am fastspeech2_mix --voc pwgan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175_pwgan.wav
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --spk_id 175 --output mix_spk175.wav

Chinese English Mixed, single male spk

# male mix tts
# The `lang` must be `mix`!
paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --output male_mix_fs2_pwgan.wav
paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --output male_mix_fs2_hifigan.wav
cywjava commented 1 year ago

中英混响怎么没有女性的音色可选呢?

bardenthenry commented 5 months ago

運行中英混雜模型

import paddle
from paddlespeech.cli.tts import TTSExecutor

tts_executor = TTSExecutor()

wav_file = tts_executor(
text='imeeting 專案進度需要延期',
output='output.wav',
am='fastspeech2_mix',
am_config=None,
am_ckpt='/STATIC_FOLDER/models',
am_stat=None,
spk_id=175,
phones_dict=None,
tones_dict=None,
speaker_dict=None,
voc='hifigan_aishell3',
voc_config=None,
voc_ckpt=None,
voc_stat=None,
lang='mix',
device=paddle.get_device())

遇到這個錯誤

ValueError: (InvalidArgument) Attr(axis) value should be in range [-R, R-1], R is the rank of Input(X). But received axis: 1, R: 1. Current Input(X)'s shape is=[256].
  [Hint: Expected axis < x_rank, but received axis:1 >= x_rank:1.] (at /paddle/paddle/phi/infermeta/unary.cc:2763)

請問有什麼方法解決呢?

chaosact commented 3 months ago

经尝试use_onnx=True可解