PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
11.01k stars 1.83k forks source link

[TTS]我运行语音专文本的示例代码报错 #3488

Open sixTiger opened 1 year ago

sixTiger commented 1 year ago

我运行示例代码直接报错

from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="zh.wav")
print(result)

错误信息如下

/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/bin/python /Users/xiaobing5/Documents/code/gitlab/AIGC_GPT_2_Video/AIGC_GPT_2_Video/voice2Text/aigc_voice_to_text.py 
2023-08-23 12:52:29.369 | INFO     | paddlespeech.s2t.modules.embedding:__init__:150 - max len: 5000
[2023-08-23 12:52:30,605] [   ERROR] - list index out of range
Traceback (most recent call last):
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py", line 314, in infer
    result_transcripts = self.model.decode(
  File "<decorator-gen-493>", line 2, in decode
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 347, in _decorate_function
    return func(*args, **kwargs)
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddlespeech/s2t/models/u2/u2.py", line 818, in decode
    hyp = self.attention_rescoring(
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddlespeech/s2t/models/u2/u2.py", line 532, in attention_rescoring
    assert speech.shape[0] == speech_lengths.shape[0]
IndexError: list index out of range
Traceback (most recent call last):
  File "/Users/xiaobing5/Documents/code/gitlab/AIGC_GPT_2_Video/AIGC_GPT_2_Video/voice2Text/aigc_voice_to_text.py", line 36, in <module>
    result = asr(audio_file="/Users/xiaobing5/Documents/code/gitlab/AIGC_GPT_2_Video/AIGC_GPT_2_Video/voice2Text/zh.wav")
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddlespeech/cli/utils.py", line 328, in _warpper
    return executor_func(self, *args, **kwargs)
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py", line 512, in __call__
    res = self.postprocess()  # Retrieve result of asr.
  File "/Users/xiaobing5/Documents/Developer/Workspace/AIGC_GPT_2_Video/lib/python3.10/site-packages/paddlespeech/cli/asr/infer.py", line 335, in postprocess
    return self._outputs["result"]
KeyError: 'result'

Process finished with exit code 1

我的环境配置

 ~/Documents/code/gitlab/AIGC_GPT_2_Video/AIGC_GPT_2_Video/models/voice_to_text  $ pip list
Package                     Version
--------------------------- ---------
aiohttp                     3.8.5
aiosignal                   1.3.1
annotated-types             0.5.0
anyio                       3.7.1
astor                       0.8.1
async-timeout               4.0.2
attrs                       23.1.0
audioread                   3.0.0
Babel                       2.12.1
bce-python-sdk              0.8.90
beautifulsoup4              4.12.2
blinker                     1.6.2
Bottleneck                  1.3.7
braceexpand                 0.1.7
bs4                         0.0.1
certifi                     2023.5.7
cffi                        1.15.1
charset-normalizer          3.2.0
click                       8.1.7
colorama                    0.4.6
coloredlogs                 15.0.1
colorlog                    6.7.0
contourpy                   1.1.0
cycler                      0.11.0
Cython                      3.0.0
datasets                    2.14.4
decorator                   4.4.2
dill                        0.3.4
Distance                    0.1.3
editdistance                0.6.2
einops                      0.6.1
et-xmlfile                  1.1.0
exceptiongroup              1.1.2
fastapi                     0.101.1
filelock                    3.12.2
Flask                       2.3.3
flask-babel                 3.1.0
flatbuffers                 23.5.26
fonttools                   4.42.1
frozenlist                  1.4.0
fsspec                      2023.6.0
ftfy                        6.1.1
future                      0.18.3
g2p-en                      2.1.0
g2pM                        0.1.2.5
gensim                      4.3.1
gradio_client               0.3.0
h11                         0.14.0
h5py                        3.9.0
httpcore                    0.17.3
httpx                       0.24.1
huggingface-hub             0.16.4
humanfriendly               10.0
HyperPyYAML                 1.2.1
idna                        3.4
imageio                     2.31.1
imageio-ffmpeg              0.4.8
inflect                     7.0.0
install                     1.3.5
itsdangerous                2.1.2
jieba                       0.42.1
Jinja2                      3.1.2
joblib                      1.3.1
jsonlines                   3.1.0
kaldiio                     2.18.0
kiwisolver                  1.4.4
lazy_loader                 0.3
librosa                     0.8.1
llvmlite                    0.40.1
loguru                      0.7.0
lxml                        4.9.3
markdown-it-py              3.0.0
MarkupSafe                  2.1.3
matplotlib                  3.7.2
mdurl                       0.1.2
mock                        5.1.0
moviepy                     1.0.3
mpmath                      1.3.0
multidict                   6.0.4
multiprocess                0.70.12.2
nara-wpe                    0.0.9
networkx                    3.1
nltk                        3.8.1
numba                       0.57.1
numpy                       1.23.5
onnx                        1.14.0
onnxruntime                 1.15.1
openai                      0.27.8
OpenCC                      0.2
opencc-python-reimplemented 0.1.7
opencv-python               4.8.0.74
openpyxl                    3.1.2
opt-einsum                  3.3.0
packaging                   23.1
paddle-bfloat               0.1.7
paddle2onnx                 1.0.9
paddleaudio                 1.1.0
paddlefsl                   1.1.0
paddlenlp                   2.6.0
paddlepaddle                2.5.1
paddleslim                  2.4.1
paddlespeech                1.4.1
paddlespeech-ctcdecoders    0.2.0
paddlespeech-feat           0.1.0
pandas                      2.0.3
parameterized               0.9.0
pathos                      0.2.8
pattern-singleton           1.2.0
Pillow                      10.0.0
pip                         23.1.2
platformdirs                3.10.0
pooch                       1.7.0
portalocker                 2.7.0
pox                         0.3.3
ppdiffusers                 0.16.3
ppft                        1.7.6.7
praatio                     5.1.1
prettytable                 3.8.0
proglog                     0.1.10
protobuf                    3.20.2
psutil                      5.9.5
pyarrow                     12.0.1
pybind11                    2.11.1
pycparser                   2.21
pycryptodome                3.18.0
pydantic                    2.2.1
pydantic_core               2.6.1
Pygments                    2.16.1
pygtrie                     2.5.0
pyparsing                   3.0.9
pypinyin                    0.44.0
pypinyin-dict               0.6.0
pytest-runner               6.0.0
python-dateutil             2.8.2
pytz                        2023.3
PyWavelets                  1.4.1
pyworld                     0.3.4
PyYAML                      6.0.1
pyzmq                       25.1.1
rarfile                     4.0
regex                       2023.6.3
requests                    2.31.0
resampy                     0.4.2
rich                        13.5.2
ruamel.yaml                 0.17.28
ruamel.yaml.clib            0.2.7
sacrebleu                   2.3.1
safetensors                 0.3.1
scikit-image                0.21.0
scikit-learn                1.3.0
scipy                       1.11.1
scs-sdk                     1.1.6
sentencepiece               0.1.99
seqeval                     1.2.2
setuptools                  67.8.0
six                         1.16.0
smart-open                  6.3.0
sniffio                     1.3.0
soundfile                   0.12.1
soupsieve                   2.4.1
starlette                   0.27.0
swig                        4.1.1
sympy                       1.12
tabulate                    0.9.0
TextGrid                    1.5
threadpoolctl               3.2.0
tifffile                    2023.7.18
timer                       0.2.2
ToJyutping                  0.2.3
tokenizers                  0.13.3
torch                       2.0.1
tqdm                        4.65.0
transformers                4.31.0
typeguard                   2.13.3
typer                       0.9.0
typing_extensions           4.7.1
tzdata                      2023.3
urllib3                     2.0.4
uvicorn                     0.23.2
visualdl                    2.5.3
wcwidth                     0.2.6
webrtcvad                   2.0.10
websockets                  11.0.3
Werkzeug                    2.3.7
wheel                       0.40.0
xxhash                      3.3.0
yacs                        0.1.8
yarl                        1.9.2
zhon                        2.0.2
HorseArcher567 commented 1 year ago

我也遇到了相同的问题 2023-09-07 19:45:00.950 | INFO | paddlespeech.s2t.modules.embedding:init:150 - max len: 5000 [2023-09-07 19:45:04,539] [ ERROR] - list index out of range Traceback (most recent call last): File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddlespeech/cli/asr/infer.py", line 314, in infer result_transcripts = self.model.decode( File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddle/fluid/dygraph/base.py", line 347, in _decorate_function return func(*args, *kwargs) File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddlespeech/s2t/models/u2/u2.py", line 818, in decode hyp = self.attention_rescoring( File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddlespeech/s2t/models/u2/u2.py", line 532, in attention_rescoring assert speech.shape[0] == speech_lengths.shape[0] IndexError: list index out of range Traceback (most recent call last): File "/Users/yangpeng/PyCharmProjects/Foo/foo/main.py", line 41, in main() File "/Users/yangpeng/PyCharmProjects/Foo/foo/main.py", line 37, in main audio_executor('./docs/en.wav') File "/Users/yangpeng/PyCharmProjects/Foo/foo/main.py", line 31, in audio_executor result = asr(audio_file=file_path) File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddlespeech/cli/utils.py", line 328, in _warpper return executor_func(self, args, kwargs) File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddlespeech/cli/asr/infer.py", line 512, in call res = self.postprocess() # Retrieve result of asr. File "/Users/yangpeng/anaconda3/envs/python_3_9/lib/python3.9/site-packages/paddlespeech/cli/asr/infer.py", line 335, in postprocess return self._outputs["result"] KeyError: 'result'

diaojinlong commented 1 year ago

指定model为conformer_wenetspeech就不报错,默认模型应该是conformer_u2pp_online_wenetspeech,奇怪的是命令行模式不报错

from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="zh.wav",model="conformer_wenetspeech")
print(result)
wudenggang commented 1 year ago

使用model="conformer_online_wenetspeech"同样报错,conformer_wenetspeech也不行,Ubuntu 20环境,paddlespeech 1.4.1.

BVVT$3HJOZ{U$2Z@7B2CYRM

wudenggang commented 1 year ago

@zxcd

yaleimeng commented 1 year ago

我这边遇到这个错误是因为采样率设置异常。设置了24000,发现只支持0、8000、16000三种。 刚开始以为是运行环境没搞好,各种调整库的版本组合,折腾了很久才找到根本原因。 供参考。

sunfan1997 commented 10 months ago

指定model为conformer_wenetspeech就不报错,默认模型应该是conformer_u2pp_online_wenetspeech,奇怪的是命令行模式不报错

from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="zh.wav",model="conformer_wenetspeech")
print(result)

加上model="conformer_wenetspeech"还是不行。我是paddlepaddle-gpu 2.5.2改成2.4.2就可以正常运行了。

getting107 commented 2 months ago

指定model为conformer_wenetspeech就不报错,默认模型应该是conformer_u2pp_online_wenetspeech,奇怪的是命令行模式不报错

from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="zh.wav",model="conformer_wenetspeech")
print(result)

加上model="conformer_wenetspeech"还是不行。我是paddlepaddle-gpu 2.5.2改成2.4.2就可以正常运行了。

感觉是paddlepaddle-gpu引起的问题,更换到2.4.2版本后问题解决