PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.55k stars 1.81k forks source link

[S2T]使用开源的S2模型,在最后预测阶段,输入出现问题 #3398

Open hebin665 opened 11 months ago

hebin665 commented 11 months ago

For support and discussions, please use our Discourse forums.

If you've found a bug then please create an issue with the following information:

Describe the bug 使用开源的Deepspeech2模型,在最后预测阶段,报错如下 Traceback (most recent call last): File "/Users/hebin/PycharmProjects/asrmodeltest/work/workspace_asr_ds2/testdemo.py", line 84, in result_transcripts = model.decode( File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 347, in _decorate_function return func(*args, *kwargs) File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddlespeech/s2t/models/ds2/deepspeech2.py", line 299, in decode eouts, eouts_len, final_state_h_box, final_state_c_box = self.encoder( File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddlespeech/s2t/models/ds2/deepspeech2.py", line 130, in forward x, final_state = self.rnn[i](x, init_state_list[i], File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, **kwargs) File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/rnn.py", line 1585, in forward return self._cudnn_impl(inputs, initial_states, sequence_length) File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/rnn.py", line 1473, in _cudnnimpl out, , state = _C_ops.rnn( ValueError: (InvalidArgument) The size of SequenceLength has to equal the batch_size. But received batch_size is 1 and the size of SequenceLength is 0.

To Reproduce Steps to reproduce the behavior:

  1. 参考网址:(https://aistudio.baidu.com/bd-gpu-01/user/79593/6547194/notebooks/6547194.ipynb)'
  2. 代码路径、![Uploading image.png…]()
  3. 测试代码如下: import paddle import warnings warnings.filterwarnings('ignore')

from yacs.config import CfgNode

from paddlespeech.s2t.frontend.speech import SpeechSegment from paddlespeech.s2t.frontend.normalizer import FeatureNormalizer from paddlespeech.s2t.frontend.featurizer.audio_featurizer import AudioFeaturizer from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer from paddlespeech.s2t.io.collator import SpeechCollator from paddlespeech.s2t.models.ds2 import DeepSpeech2Model

from matplotlib import pyplot as plt

%matplotlib inline

设置预训练模型的路径

config_path = "conf/deepspeech2.yaml" checkpoint_path = "./exp/deepspeech2/checkpoints/avg_1.pdparams" audio_file = "data/demo_01_03.wav"

读取 conf 文件并结构化

ds2_config = CfgNode(new_allowed=True) ds2_config.merge_from_file(config_path) print(ds2_config)

构建音频特征提取对象

feat_config = ds2_config.collator audio_featurizer = AudioFeaturizer( spectrum_type=feat_config.spectrum_type, feat_dim=feat_config.feat_dim, delta_delta=feat_config.delta_delta, stride_ms=feat_config.stride_ms, window_ms=feat_config.window_ms, n_fft=feat_config.n_fft, max_freq=feat_config.max_freq, target_sample_rate=feat_config.target_sample_rate, use_dB_normalization=feat_config.use_dB_normalization, target_dB=feat_config.target_dB, dither=feat_config.dither) feature_normalizer = FeatureNormalizer(feat_config.mean_std_filepath) if feat_config.mean_std_filepath else None

提取音频的特征

'None' 只是一个占位符,因为预测的时候不需要reference

speech_segment = SpeechSegment.from_file(audio_file, "None") audio_feature = audio_featurizer.featurize(speech_segment) audio_feature_i = feature_normalizer.apply(audio_feature)

audio_len = audio_feature_i.shape[0] audio_len = paddle.to_tensor(audio_len) audio_feature = paddle.to_tensor(audio_feature_i, dtype='float32') audio_feature = paddle.unsqueeze(audio_feature, axis=0) print(f"shape: {audio_feature.shape}")

plt.figure() plt.imshow(audio_feature_i.T, origin='lower') plt.show()

构建Deepspeech2模型

model_conf = ds2_config.model

input dim is feature size

model_conf.input_dim = 161

output_dim is vocab size

model_conf.output_dim = 4301 model = DeepSpeech2Model.from_config(model_conf)

加载预训练的模型

model_dict = paddle.load(checkpoint_path) model.set_state_dict(model_dict)

进行预测

decoding_config = ds2_config.decoding decode_batch_size = 1 print (decoding_config) text_feature = TextFeaturizer(unit_type='char', vocab=ds2_config.collator.vocab_filepath) vocab_list = text_feature.vocab_list model.decoder.init_decoder( decode_batch_size, vocab_list, decoding_config.decoding_method, decoding_config.lang_model_path, decoding_config.alpha, decoding_config.beta, decoding_config.beam_size, decoding_config.cutoff_prob, decoding_config.cutoff_top_n, decoding_config.num_proc_bsearch) result_transcripts = model.decode( audio_feature, audio_len )

print ("预测结果为:") print (result_transcripts[0])

  1. 运行该文件
  2. See error

Expected behavior 预期是输出对应的转写结果

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context 如需配合请随时联系我

hebin665 commented 11 months ago

image

hebin665 commented 11 months ago

image

zxcd commented 11 months ago

目前来看报错如下:

File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/rnn.py", line 1473, in _cudnn_impl
out, _, state = _C_ops.rnn(
ValueError: (InvalidArgument) The size of SequenceLength has to equal the batch_size. But received batch_size is 1 and the size of SequenceLength is 0.
[Hint: Expected in_dims[1] == seq_dims[0], but received in_dims[1]:1 != seq_dims[0]:0.] (at /Users/paddle/xly/workspace/9fc77989-de12-406f-9c25-c7ddd992fc3c/Paddle/paddle/phi/infermeta/multiary.cc:2690)

建议检查一下输入的语音读取是否有问题。

hebin665 commented 11 months ago

谢谢回复,语音读取没有问题。更换paddlepaddle版本为2.5.0rc0 后解决

hebin665 commented 11 months ago

pip install paddlepaddle==2.5.0rc0 -i https://pypi.tuna.tsinghua.edu.cn/simple