Open hebin665 opened 11 months ago
目前来看报错如下:
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/rnn.py", line 1473, in _cudnn_impl
out, _, state = _C_ops.rnn(
ValueError: (InvalidArgument) The size of SequenceLength has to equal the batch_size. But received batch_size is 1 and the size of SequenceLength is 0.
[Hint: Expected in_dims[1] == seq_dims[0], but received in_dims[1]:1 != seq_dims[0]:0.] (at /Users/paddle/xly/workspace/9fc77989-de12-406f-9c25-c7ddd992fc3c/Paddle/paddle/phi/infermeta/multiary.cc:2690)
建议检查一下输入的语音读取是否有问题。
谢谢回复,语音读取没有问题。更换paddlepaddle版本为2.5.0rc0 后解决
pip install paddlepaddle==2.5.0rc0 -i https://pypi.tuna.tsinghua.edu.cn/simple
For support and discussions, please use our Discourse forums.
If you've found a bug then please create an issue with the following information:
Describe the bug 使用开源的Deepspeech2模型,在最后预测阶段,报错如下 Traceback (most recent call last): File "/Users/hebin/PycharmProjects/asrmodeltest/work/workspace_asr_ds2/testdemo.py", line 84, in
result_transcripts = model.decode(
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), kw)
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 347, in _decorate_function
return func(*args, *kwargs)
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddlespeech/s2t/models/ds2/deepspeech2.py", line 299, in decode
eouts, eouts_len, final_state_h_box, final_state_c_box = self.encoder(
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(inputs, kwargs)
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddlespeech/s2t/models/ds2/deepspeech2.py", line 130, in forward
x, final_state = self.rnn[i](x, init_state_list[i],
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/rnn.py", line 1585, in forward
return self._cudnn_impl(inputs, initial_states, sequence_length)
File "/Users/hebin/opt/anaconda3/envs/speechtest/lib/python3.10/site-packages/paddle/nn/layer/rnn.py", line 1473, in _cudnnimpl
out, , state = _C_ops.rnn(
ValueError: (InvalidArgument) The size of SequenceLength has to equal the batch_size. But received batch_size is 1 and the size of SequenceLength is 0.
To Reproduce Steps to reproduce the behavior:
from yacs.config import CfgNode
from paddlespeech.s2t.frontend.speech import SpeechSegment from paddlespeech.s2t.frontend.normalizer import FeatureNormalizer from paddlespeech.s2t.frontend.featurizer.audio_featurizer import AudioFeaturizer from paddlespeech.s2t.frontend.featurizer.text_featurizer import TextFeaturizer from paddlespeech.s2t.io.collator import SpeechCollator from paddlespeech.s2t.models.ds2 import DeepSpeech2Model
from matplotlib import pyplot as plt
%matplotlib inline
设置预训练模型的路径
config_path = "conf/deepspeech2.yaml" checkpoint_path = "./exp/deepspeech2/checkpoints/avg_1.pdparams" audio_file = "data/demo_01_03.wav"
读取 conf 文件并结构化
ds2_config = CfgNode(new_allowed=True) ds2_config.merge_from_file(config_path) print(ds2_config)
构建音频特征提取对象
feat_config = ds2_config.collator audio_featurizer = AudioFeaturizer( spectrum_type=feat_config.spectrum_type, feat_dim=feat_config.feat_dim, delta_delta=feat_config.delta_delta, stride_ms=feat_config.stride_ms, window_ms=feat_config.window_ms, n_fft=feat_config.n_fft, max_freq=feat_config.max_freq, target_sample_rate=feat_config.target_sample_rate, use_dB_normalization=feat_config.use_dB_normalization, target_dB=feat_config.target_dB, dither=feat_config.dither) feature_normalizer = FeatureNormalizer(feat_config.mean_std_filepath) if feat_config.mean_std_filepath else None
提取音频的特征
'None' 只是一个占位符,因为预测的时候不需要reference
speech_segment = SpeechSegment.from_file(audio_file, "None") audio_feature = audio_featurizer.featurize(speech_segment) audio_feature_i = feature_normalizer.apply(audio_feature)
audio_len = audio_feature_i.shape[0] audio_len = paddle.to_tensor(audio_len) audio_feature = paddle.to_tensor(audio_feature_i, dtype='float32') audio_feature = paddle.unsqueeze(audio_feature, axis=0) print(f"shape: {audio_feature.shape}")
plt.figure() plt.imshow(audio_feature_i.T, origin='lower') plt.show()
构建Deepspeech2模型
model_conf = ds2_config.model
input dim is feature size
model_conf.input_dim = 161
output_dim is vocab size
model_conf.output_dim = 4301 model = DeepSpeech2Model.from_config(model_conf)
加载预训练的模型
model_dict = paddle.load(checkpoint_path) model.set_state_dict(model_dict)
进行预测
decoding_config = ds2_config.decoding decode_batch_size = 1 print (decoding_config) text_feature = TextFeaturizer(unit_type='char', vocab=ds2_config.collator.vocab_filepath) vocab_list = text_feature.vocab_list model.decoder.init_decoder( decode_batch_size, vocab_list, decoding_config.decoding_method, decoding_config.lang_model_path, decoding_config.alpha, decoding_config.beta, decoding_config.beam_size, decoding_config.cutoff_prob, decoding_config.cutoff_top_n, decoding_config.num_proc_bsearch) result_transcripts = model.decode( audio_feature, audio_len )
print ("预测结果为:") print (result_transcripts[0])
Expected behavior 预期是输出对应的转写结果
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context 如需配合请随时联系我