PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.83k stars 1.82k forks source link

[TTS] 无法使用 multi-speaker 生成音频 #3478

Open kaka1909 opened 1 year ago

kaka1909 commented 1 year ago

参考 https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos/text_to_speech 的 multi-speaker 命令,但是遇到以下错误

Describe the bug (venv) λ paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --spk_id 174 --output mix_spk1 74.wav I0816 02:24:36.038482 38216 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. I0816 02:24:36.039484 38216 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify 'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6. C:\Users\Administrator\Desktop\sd\paddle\venv\lib\site-packages\paddle\nn\layer\layers.py:1897: UserWarning: Skip loading for encoder.embed.1.alpha. encoder.embed.1.alpha receives a shape [1], but the expected shape is []. warnings.warn(f"Skip loading for {key}. " + str(err)) C:\Users\Administrator\Desktop\sd\paddle\venv\lib\site-packages\paddle\nn\layer\layers.py:1897: UserWarning: Skip loading for decoder.embed.0.alpha. decoder.embed.0.alpha receives a shape [1], but the expected shape is []. warnings.warn(f"Skip loading for {key}. " + str(err)) ValueError: (InvalidArgument) Attr(axis) value should be in range [-R, R-1], R is the rank of Input(X). But received axis: 1, R: 1. Current Input(X)'s shape is=[256]. [Hint: Expected axis < x_rank, but received axis:1 >= x_rank:1.] (at ..\paddle\phi\infermeta\unary.cc:2763)

To Reproduce 命令:paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --spk_id 174 --output mix_spk1 74.wav

Expected behavior 可以顺利生成音频

Environment (please complete the following information):

kaka1909 commented 1 year ago

使用 use_onnx = True 是可以通过的

name2023well commented 12 months ago

这个问题最后解决了么

whtwhtw commented 10 months ago

tts_executor = TTSExecutor() wav_file = tts_executor( text="热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!", output='output.wav') 这样默认是可以运行,但是如果加入参数am='fastspeech2_mix'就会出现楼主描述的问题,即: (InvalidArgument) Attr(axis) value should be in range [-R, R-1], R is the rank of Input(X). am=其它-zh的参数就不会出现这个问题,但是识别不了英文

mrzjl commented 10 months ago

用最新develop分支运行不会报错