按照文档安装过程中没有任何报错，但是执行语音识别命令报错，重装好几次都是一样

LjPro commented 7 months ago

General Question

pip list： `Package Version

absl-py 2.1.0 aiohttp 3.9.3 aiosignal 1.3.1 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.3.0 astor 0.8.1 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 Babel 2.14.0 bce-python-sdk 0.9.4 blinker 1.7.0 bokeh 3.3.4 boltons 23.1.1 Bottleneck 1.3.8 braceexpand 0.1.7 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 colorlog 6.8.2 contourpy 1.2.0 cycler 0.12.1 Cython 3.0.8 datasets 2.18.0 decorator 5.1.1 dill 0.3.4 Distance 0.1.3 editdistance 0.8.1 einops 0.7.0 exceptiongroup 1.2.0 executing 2.0.1 fastapi 0.110.0 filelock 3.13.1 Flask 3.0.2 flask-babel 4.0.0 flatbuffers 23.5.26 fonttools 4.49.0 frozenlist 1.4.1 fsspec 2024.2.0 ftfy 6.1.3 future 1.0.0 g2p-en 2.1.0 g2pM 0.1.2.5 h11 0.14.0 h5py 3.10.0 httpcore 1.0.4 httpx 0.27.0 huggingface-hub 0.21.3 humanfriendly 10.0 HyperPyYAML 1.2.2 idna 3.6 inflect 7.0.0 intervaltree 3.1.0 ipython 8.22.1 itsdangerous 2.1.2 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.3 joblib 1.3.2 jsonlines 4.0.0 kaldiio 2.18.0 kiwisolver 1.4.5 librosa 0.8.1 llvmlite 0.42.0 loguru 0.7.2 lxml 5.1.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.3 matplotlib-inline 0.1.6 mdurl 0.1.2 mido 1.3.2 mock 5.1.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.12.2 nara-wpe 0.0.9 nltk 3.8.1 note-seq 0.0.3 numba 0.59.0 numpy 1.23.5 omegaconf 2.3.0 onnx 1.15.0 onnxruntime 1.17.1 OpenCC 1.1.7 opencc-python-reimplemented 0.1.7 opencv-python 4.6.0.66 opt-einsum 3.3.0 packaging 23.2 paddle2onnx 1.0.6 paddleaudio 1.1.0 paddlefsl 1.1.0 paddlenlp 2.6.1 paddlepaddle-gpu 2.6.0 paddlesde 0.2.5 paddleslim 2.6.0 paddlespeech 0.0.0 paddlespeech-feat 0.1.0 pandas 2.2.1 parameterized 0.9.0 parso 0.8.3 pathos 0.2.8 pattern-singleton 1.2.0 pillow 10.2.0 pip 23.3.1 platformdirs 4.2.0 pooch 1.8.1 portalocker 2.8.2 pox 0.3.4 ppdiffusers 0.19.4 ppft 1.7.6.8 praatio 5.1.1 pretty-midi 0.2.10 prettytable 3.10.0 prompt-toolkit 3.0.43 protobuf 3.20.2 psutil 5.9.8 pure-eval 0.2.2 pyarrow 15.0.0 pyarrow-hotfix 0.6 pybind11 2.11.1 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.3 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 pygtrie 2.5.0 pyparsing 3.1.1 pypinyin 0.44.0 pypinyin-dict 0.7.0 pyreadline3 3.4.1 pytest-runner 6.0.1 python-dateutil 2.9.0.post0 pytz 2024.1 pywin32 306 pyworld 0.3.4 PyYAML 6.0.1 pyzmq 25.1.2 rarfile 4.1 regex 2023.12.25 requests 2.31.0 requests-mock 1.11.0 resampy 0.4.2 rich 13.7.1 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 sacrebleu 2.4.0 safetensors 0.4.2 scikit-learn 1.4.1.post1 scipy 1.12.0 sentencepiece 0.2.0 seqeval 1.2.2 setuptools 68.2.2 six 1.16.0 sniffio 1.3.1 sortedcontainers 2.4.0 soundfile 0.12.1 stack-data 0.6.3 starlette 0.36.3 swig 4.2.1 sympy 1.12 tabulate 0.9.0 TextGrid 1.6.1 threadpoolctl 3.3.0 timer 0.2.2 ToJyutping 0.2.1 tornado 6.4 tqdm 4.66.2 traitlets 5.14.1 trampoline 0.1.2 typeguard 2.13.3 typer 0.9.0 typing_extensions 4.10.0 tzdata 2024.1 urllib3 1.26.18 uvicorn 0.27.1 visualdl 2.5.3 wcwidth 0.2.13 webrtcvad 2.0.10 websockets 12.0 Werkzeug 3.0.1 wheel 0.41.2 win32-setctime 1.1.0 xxhash 3.4.1 xyzservices 2023.10.1 yacs 0.1.8 yarl 1.9.4 zhon 2.0.2`

powershell执行： paddlespeech asr --lang zh --input zh.wav `(paddle_test) PS E:\AI_WorkSpace> paddlespeech asr --lang zh --input zh.wav C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddleaudio_extension.py:141: UserWarning: paddleaudio C++ extension is not available. warnings.warn("paddleaudio C++ extension is not available.") C:\Users\an.conda\envs\paddle_test\lib\site-packages_distutils_hack__init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") 2024-03-03 10:39:49.952 | INFO | paddlespeech.s2t.modules.ctc::45 - paddlespeech_ctcdecoders not installed! W0303 10:39:49.955922 31980 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.3, Runtime API Version: 11.8 W0303 10:39:49.970871 31980 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. 2024-03-03 10:39:50.355 | INFO | paddlespeech.s2t.modules.embedding:init:153 - max len: 5000 [2024-03-03 10:39:52,263] [ ERROR] - (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 1, 0, 498] and the shape of Y = [1, 123, 123]. Received [498] in X is not equal to [123] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86) Traceback (most recent call last): File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\cli\asr\infer.py", line 314, in infer result_transcripts = self.model.decode( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), *kw) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddle\base\dygraph\base.py", line 352, in _decorate_function return func(args, **kwargs) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 818, in decode hyp = self.attention_rescoring( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 543, in attention_rescoring hyps, encoder_out = self._ctc_prefix_beam_search( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 424, in _ctc_prefix_beam_search encoder_out, encoder_mask = self._forward_encoder( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 229, in _forward_encoder encoder_out, encoder_mask = self.encoder( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddle\nn\layer\layers.py", line 1429, in call__ return self.forward(*inputs, **kwargs) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\modules\encoder.py", line 184, in forward chunk_masks = add_optional_chunk_mask( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\modules\mask.py", line 202, in add_optional_chunk_mask chunk_masks = masks.logical_and(chunk_masks) # (B, L, L) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddle\tensor\logic.py", line 143, in logical_and return _C_ops.logical_and(x, y) ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 1, 0, 498] and the shape of Y = [1, 123, 123]. Received [498] in X is not equal to [123] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86)

KeyError: 'result'`

18721688783 commented 7 months ago

我的也是，尝试了各种版本，安装成功，最终也是这个错误Broadcast dimension mismatch.

Ray961123 commented 7 months ago

开发者你好，感谢关注 PaddleSpeech 开源项目，抱歉给你带来了不好的开发体验，目前开源项目维护人力有限，建议参考：https://github.com/PaddlePaddle/PaddleSpeech/issues/3246

ljh-coder commented 7 months ago

General Question

pip list： `Package Version

absl-py 2.1.0 aiohttp 3.9.3 aiosignal 1.3.1 annotated-types 0.6.0 antlr4-python3-runtime 4.9.3 anyio 4.3.0 astor 0.8.1 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 Babel 2.14.0 bce-python-sdk 0.9.4 blinker 1.7.0 bokeh 3.3.4 boltons 23.1.1 Bottleneck 1.3.8 braceexpand 0.1.7 certifi 2024.2.2 cffi 1.16.0 charset-normalizer 3.3.2 click 8.1.7 colorama 0.4.6 coloredlogs 15.0.1 colorlog 6.8.2 contourpy 1.2.0 cycler 0.12.1 Cython 3.0.8 datasets 2.18.0 decorator 5.1.1 dill 0.3.4 Distance 0.1.3 editdistance 0.8.1 einops 0.7.0 exceptiongroup 1.2.0 executing 2.0.1 fastapi 0.110.0 filelock 3.13.1 Flask 3.0.2 flask-babel 4.0.0 flatbuffers 23.5.26 fonttools 4.49.0 frozenlist 1.4.1 fsspec 2024.2.0 ftfy 6.1.3 future 1.0.0 g2p-en 2.1.0 g2pM 0.1.2.5 h11 0.14.0 h5py 3.10.0 httpcore 1.0.4 httpx 0.27.0 huggingface-hub 0.21.3 humanfriendly 10.0 HyperPyYAML 1.2.2 idna 3.6 inflect 7.0.0 intervaltree 3.1.0 ipython 8.22.1 itsdangerous 2.1.2 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.3 joblib 1.3.2 jsonlines 4.0.0 kaldiio 2.18.0 kiwisolver 1.4.5 librosa 0.8.1 llvmlite 0.42.0 loguru 0.7.2 lxml 5.1.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.3 matplotlib-inline 0.1.6 mdurl 0.1.2 mido 1.3.2 mock 5.1.0 mpmath 1.3.0 multidict 6.0.5 multiprocess 0.70.12.2 nara-wpe 0.0.9 nltk 3.8.1 note-seq 0.0.3 numba 0.59.0 numpy 1.23.5 omegaconf 2.3.0 onnx 1.15.0 onnxruntime 1.17.1 OpenCC 1.1.7 opencc-python-reimplemented 0.1.7 opencv-python 4.6.0.66 opt-einsum 3.3.0 packaging 23.2 paddle2onnx 1.0.6 paddleaudio 1.1.0 paddlefsl 1.1.0 paddlenlp 2.6.1 paddlepaddle-gpu 2.6.0 paddlesde 0.2.5 paddleslim 2.6.0 paddlespeech 0.0.0 paddlespeech-feat 0.1.0 pandas 2.2.1 parameterized 0.9.0 parso 0.8.3 pathos 0.2.8 pattern-singleton 1.2.0 pillow 10.2.0 pip 23.3.1 platformdirs 4.2.0 pooch 1.8.1 portalocker 2.8.2 pox 0.3.4 ppdiffusers 0.19.4 ppft 1.7.6.8 praatio 5.1.1 pretty-midi 0.2.10 prettytable 3.10.0 prompt-toolkit 3.0.43 protobuf 3.20.2 psutil 5.9.8 pure-eval 0.2.2 pyarrow 15.0.0 pyarrow-hotfix 0.6 pybind11 2.11.1 pycparser 2.21 pycryptodome 3.20.0 pydantic 2.6.3 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 pygtrie 2.5.0 pyparsing 3.1.1 pypinyin 0.44.0 pypinyin-dict 0.7.0 pyreadline3 3.4.1 pytest-runner 6.0.1 python-dateutil 2.9.0.post0 pytz 2024.1 pywin32 306 pyworld 0.3.4 PyYAML 6.0.1 pyzmq 25.1.2 rarfile 4.1 regex 2023.12.25 requests 2.31.0 requests-mock 1.11.0 resampy 0.4.2 rich 13.7.1 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 sacrebleu 2.4.0 safetensors 0.4.2 scikit-learn 1.4.1.post1 scipy 1.12.0 sentencepiece 0.2.0 seqeval 1.2.2 setuptools 68.2.2 six 1.16.0 sniffio 1.3.1 sortedcontainers 2.4.0 soundfile 0.12.1 stack-data 0.6.3 starlette 0.36.3 swig 4.2.1 sympy 1.12 tabulate 0.9.0 TextGrid 1.6.1 threadpoolctl 3.3.0 timer 0.2.2 ToJyutping 0.2.1 tornado 6.4 tqdm 4.66.2 traitlets 5.14.1 trampoline 0.1.2 typeguard 2.13.3 typer 0.9.0 typing_extensions 4.10.0 tzdata 2024.1 urllib3 1.26.18 uvicorn 0.27.1 visualdl 2.5.3 wcwidth 0.2.13 webrtcvad 2.0.10 websockets 12.0 Werkzeug 3.0.1 wheel 0.41.2 win32-setctime 1.1.0 xxhash 3.4.1 xyzservices 2023.10.1 yacs 0.1.8 yarl 1.9.4 zhon 2.0.2`

powershell执行： paddlespeech asr --lang zh --input zh.wav `(paddle_test) PS E:\AI_WorkSpace> paddlespeech asr --lang zh --input zh.wav C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddleaudio_extension.py:141: UserWarning: paddleaudio C++ extension is not available. warnings.warn("paddleaudio C++ extension is not available.") C:\Users\an.conda\envs\paddle_test\lib\site-packages_distutils_hackinit.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") 2024-03-03 10:39:49.952 | INFO | paddlespeech.s2t.modules.ctc::45 - paddlespeech_ctcdecoders not installed! W0303 10:39:49.955922 31980 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.3, Runtime API Version: 11.8 W0303 10:39:49.970871 31980 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. 2024-03-03 10:39:50.355 | INFO | paddlespeech.s2t.modules.embedding:init:153 - max len: 5000 [2024-03-03 10:39:52,263] [ ERROR] - (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 1, 0, 498] and the shape of Y = [1, 123, 123]. Received [498] in X is not equal to [123] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86) Traceback (most recent call last): File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\cli\asr\infer.py", line 314, in infer result_transcripts = self.model.decode( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddle\base\dygraph\base.py", line 352, in _decorate_function return func(*args, kwargs) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 818, in decode hyp = self.attention_rescoring( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 543, in attention_rescoring hyps, encoder_out = self._ctc_prefix_beam_search( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 424, in _ctc_prefix_beam_search encoder_out, encoder_mask = self._forward_encoder( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\models\u2\u2.py", line 229, in _forward_encoder encoder_out, encoder_mask = self.encoder( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddle\nn\layer\layers.py", line 1429, in call* return self.forward(inputs, kwargs) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\modules\encoder.py", line 184, in forward chunk_masks = add_optional_chunk_mask( File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddlespeech\s2t\modules\mask.py", line 202, in add_optional_chunk_mask chunk_masks = masks.logical_and(chunk_masks) # (B, L, L) File "C:\Users\an.conda\envs\paddle_test\lib\site-packages\paddle\tensor\logic.py", line 143, in logical_and return _C_ops.logical_and(x, y) ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 1, 0, 498] and the shape of Y = [1, 123, 123]. Received [498] in X is not equal to [123] in Y at i:3. [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/phi/kernels/funcs/common_shape.h:86)

KeyError: 'result'`

可以参考 #3697 看看有没有帮助，底部我贴了一个博客链接，有详情的安装过程和部分报错的处理，你这个主要还是版本的问题

ljh-coder commented 7 months ago

我的也是，尝试了各种版本，安装成功，最终也是这个错误Broadcast dimension mismatch.

可以参考 https://github.com/PaddlePaddle/PaddleSpeech/issues/3697 看看有没有帮助，底部我贴了一个博客链接，有详情的安装过程和部分报错的处理

18721688783 commented 6 months ago

您好，问题在上面截图中描述，谢谢。

Message ID: @.***>

hbjhyhb commented 3 months ago

这个项目的包版本管理真的是一塌糊涂。本来在requirement.txt可以一次性解决包匹配问题，可就是不写包的版本号，故意折腾各位。真是服了。

PaddlePaddle / PaddleSpeech

按照文档安装过程中没有任何报错，但是执行语音识别命令报错，重装好几次都是一样 #3692

General Question

General Question