PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.57k stars 1.81k forks source link

请问有使用ASR模型导出的 pdmodel 和 padiparams文件推理的示例嘛 #2450

Closed lichuanqi closed 1 year ago

lichuanqi commented 1 year ago

请问有使用模型导出的 pdmodel 和 padiparams文件推理的示例嘛, 不知道为啥在GPU上用config和pdparams预测asr的时候很慢,大概10s左右(和我在笔记本上用CPU预测的时间差不多), 看了一下显存占用1000M多点,不知道正常不啊。 我根据百度搜到的 paddle.inference 相关资料,尝试使用 deepspeech2_online_aishell_fbank161 导出的 pdmodel 和 padiparams文件进行预测,但是没搞懂输入输出都是啥啊,卡在一个很奇怪的报错上进行不下去了。

input_names:  ['audio_chunk', 'audio_chunk_lens', 'chunk_state_h_box', 'chunk_state_c_box']
output_names: ['softmax_0.tmp_0', 'tmp_5', 'concat_0.tmp_0', 'concat_1.tmp_0']

代码如下:

# 使用 deepspeech2_online_aishell_fbank161 导出的
# pdmodel 和 padiparams 文件推理

import argparse
import numpy as np
from yacs.config import CfgNode

import paddle
# 引用 paddle inference 预测库
import paddle.inference as paddle_infer
from paddlespeech.s2t.frontend.speech import SpeechSegment
from paddlespeech.s2t.frontend.normalizer import FeatureNormalizer
from paddlespeech.s2t.frontend.featurizer.audio_featurizer import AudioFeaturizer

def main():
    args = parse_args()

    cfg = CfgNode(new_allowed=True)
    cfg.merge_from_file(args.cfg_path)
    cfg.freeze()

    # 创建 config
    config = paddle_infer.Config(args.model_file, args.params_file)
    config.disable_gpu()

    print(f'已加载模型\n' 
          f'model_file: {args.model_file}\n' 
          f'params_file: {args.params_file}')

    # 根据 config 创建 predictor
    predictor = paddle_infer.create_predictor(config)

    # 获取输入名称
    input_names = predictor.get_input_names()
    print('input_names: ',input_names)
    # 获取输出名称
    output_names = predictor.get_output_names()
    print('output_names:', output_names)

    # 特征提取
    audio_featurizer = AudioFeaturizer(spectrum_type='fbank',
                                    feat_dim=161,
                                    delta_delta=False,
                                    stride_ms=10.0,
                                    window_ms=20.0,
                                    n_fft=None,
                                    max_freq=None,
                                    target_sample_rate=16000,
                                    use_dB_normalization=True,
                                    target_dB=-20,
                                    dither=1.0)
    speech_segment = SpeechSegment.from_file(args.audio_path, "None")
    audio_feature = audio_featurizer.featurize(speech_segment)
    # 归一化
    # feature_normalizer = FeatureNormalizer(mean_std_filepath) if feat_config.mean_std_filepath else None
    # audio_feature_i = feature_normalizer.apply(audio_feature)

    audio_feature = np.array(audio_feature).astype(np.float32)[np.newaxis, :]
    audio_len = np.array(audio_feature.shape[2]).astype(np.int64)
    print(f"feature shape: {audio_feature.shape}")

    # 获取输入层
    input_handle_audio = predictor.get_input_handle(input_names[0])
    input_handle_audio_len = predictor.get_input_handle(input_names[1])
    # 设置输入
    input_handle_audio.reshape([audio_feature.shape[0], audio_feature.shape[1], audio_feature.shape[2]])
    input_handle_audio.copy_from_cpu(audio_feature)
    input_handle_audio_len.reshape([audio_len])
    input_handle_audio_len.copy_from_cpu(audio_len)

    # 运行predictor
    predictor.run()

    output_handle = predictor.get_output_handle(output_names[0])
    output_data = output_handle.copy_to_cpu() # numpy.ndarray类型
    print("Output data is {}".format(output_data))

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--cfg_path", 
            default='configs/deepspeech2_online_aishell_fbank161/conf/deepspeech2_online.yaml',
            type=str, help="*.pdmodel filepath")
    parser.add_argument("--model_file", 
            default='configs/deepspeech2_online_aishell_fbank161/export/avg_1.jit.pdmodel',
            type=str, help="*.pdmodel filepath")
    parser.add_argument("--params_file",
            default='configs/deepspeech2_online_aishell_fbank161/export/avg_1.jit.pdiparams',
            type=str, help="*.padiparams filepath")
    parser.add_argument("--audio_path",
            default='D:/Data/Speech/short/001.wav',
            type=str, help="*.wav filepath")
    parser.add_argument("--batch_size", type=int, default=1, help="batch size")

    return parser.parse_args()

if __name__ == "__main__":
    main()

目前的终端输出和报错情况如下:

PS D:\Code\PADDLE\PaddleSpeech-develop> & D:/Program/miniconda3/envs/paddle/python.exe d:/Code/PADDLE/PaddleSpeech-develop/demos/speech_recognition/infer_ds2.py
已加载模型
model_file: configs/deepspeech2_online_aishell_fbank161/export/avg_1.jit.pdmodel
params_file: configs/deepspeech2_online_aishell_fbank161/export/avg_1.jit.pdiparams
e[1me[35m--- Running analysis [ir_graph_build_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_clean_pass]e[0m
e[1me[35m--- Running analysis [ir_analysis_pass]e[0m
e[32m--- Running IR pass [simplify_with_basic_ops_pass]e[0m
e[32m--- Running IR pass [layer_norm_fuse_pass]e[0m
e[37m---    Fused 0 subgraphs into layer_norm op.e[0m
e[32m--- Running IR pass [attention_lstm_fuse_pass]e[0m
e[32m--- Running IR pass [seqconv_eltadd_relu_fuse_pass]e[0m
e[32m--- Running IR pass [seqpool_cvm_concat_fuse_pass]e[0m
e[32m--- Running IR pass [mul_lstm_fuse_pass]e[0m
e[32m--- Running IR pass [fc_gru_fuse_pass]e[0m
e[37m---    fused 0 pairs of fc gru patternse[0m
e[32m--- Running IR pass [mul_gru_fuse_pass]e[0m
e[32m--- Running IR pass [seq_concat_fc_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]e[0m
e[32m--- Running IR pass [matmul_v2_scale_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]e[0m
I0925 18:00:40.682015 14972 fuse_pass_base.cc:57] ---  detected 1 subgraphs
e[32m--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]e[0m
e[32m--- Running IR pass [matmul_scale_fuse_pass]e[0m
e[32m--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]e[0m
e[32m--- Running IR pass [fc_fuse_pass]e[0m
I0925 18:00:40.697669 14972 fuse_pass_base.cc:57] ---  detected 1 subgraphs
e[32m--- Running IR pass [repeated_fc_relu_fuse_pass]e[0m
e[32m--- Running IR pass [squared_mat_sub_fuse_pass]e[0m
e[32m--- Running IR pass [conv_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_transpose_bn_fuse_pass]e[0m
e[32m--- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass]e[0m
e[32m--- Running IR pass [is_test_pass]e[0m
e[32m--- Running IR pass [runtime_context_cache_pass]e[0m
e[1me[35m--- Running analysis [ir_params_sync_among_devices_pass]e[0m
e[1me[35m--- Running analysis [adjust_cudnn_workspace_size_pass]e[0m
e[1me[35m--- Running analysis [inference_op_replace_pass]e[0m
e[1me[35m--- Running analysis [ir_graph_to_program_pass]e[0m
I0925 18:00:40.809091 14972 analysis_predictor.cc:1035] ======= optimize end =======
I0925 18:00:40.809091 14972 naive_executor.cc:102] ---  skip [feed], feed -> chunk_state_c_box
I0925 18:00:40.810307 14972 naive_executor.cc:102] ---  skip [feed], feed -> chunk_state_h_box
I0925 18:00:40.810307 14972 naive_executor.cc:102] ---  skip [feed], feed -> audio_chunk_lens
I0925 18:00:40.811797 14972 naive_executor.cc:102] ---  skip [feed], feed -> audio_chunk
I0925 18:00:40.813787 14972 naive_executor.cc:102] ---  skip [softmax_0.tmp_0], fetch -> fetch
I0925 18:00:40.814805 14972 naive_executor.cc:102] ---  skip [tmp_5], fetch -> fetch
I0925 18:00:40.815081 14972 naive_executor.cc:102] ---  skip [concat_0.tmp_0], fetch -> fetch
I0925 18:00:40.815081 14972 naive_executor.cc:102] ---  skip [concat_1.tmp_0], fetch -> fetch
input_names:  ['audio_chunk', 'audio_chunk_lens', 'chunk_state_h_box', 'chunk_state_c_box']
output_names: ['softmax_0.tmp_0', 'tmp_5', 'concat_0.tmp_0', 'concat_1.tmp_0']
feature shape: (1, 476, 161)
Traceback (most recent call last):
  File "d:/Code/PADDLE/PaddleSpeech-develop/demos/speech_recognition/infer_ds2.py", line 98, in <module>
    main()
  File "d:/Code/PADDLE/PaddleSpeech-develop/demos/speech_recognition/infer_ds2.py", line 73, in main
    predictor.run()
ValueError: In user code:

    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/exps/deepspeech2/bin/export.py", line 56, in <module>
      main(config, args)
    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/exps/deepspeech2/bin/export.py", line 30, in main
      main_sp(config, args)
    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/exps/deepspeech2/bin/export.py", line 26, in main_sp
      exp.run_export()
    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/training/trainer.py", line 365, in run_export
      self.export()
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 351, in _decorate_function
      return func(*args, **kwargs)
    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/exps/deepspeech2/model.py", line 434, in export
      paddle.jit.save(static_model, self.args.export_path)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
      return func(*args, **kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 744, in save 
      inner_input_spec)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 517, in concrete_program_specify_input_spec
      *desired_input_spec)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 427, in get_concrete_program
      concrete_program, partial_program_layer = self._program_cache[cache_key]
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 723, in __getitem__
      self._caches[item] = self._build_once(item)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 714, in _build_once
      **cache_key.kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
      return func(*args, **kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 662, in from_func_spec
      outputs = static_func(*inputs)
    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/models/ds2_online/deepspeech2.py", line 378, in forward
      audio_chunk, audio_chunk_lens, chunk_state_h_box, chunk_state_c_box)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/tmp/tmpktxfs6wq.py", line 59, in forward
      init_state_list))
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/convert_operators.py", line 211, in convert_ifelse
      out = _run_py_ifelse(pred, true_fn, false_fn, true_args, false_args)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/convert_operators.py", line 257, in _run_py_ifelse
      return true_fn(*true_args) if pred else false_fn(*false_args)
    File "/tmp/tmpktxfs6wq.py", line 50, in true_fn_1
      init_state_list)))
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/convert_operators.py", line 211, in convert_ifelse
      out = _run_py_ifelse(pred, true_fn, false_fn, true_args, false_args)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/convert_operators.py", line 257, in _run_py_ifelse
      return true_fn(*true_args) if pred else false_fn(*false_args)
    File "/home/huangyuxin/workspace/PaddleSpeech_for_align_develop/PaddleSpeech_align_dist_fusion_init/paddlespeech/s2t/models/ds2_online/deepspeech2.py", line 119, in forward
      init_state_h_box, self.num_rnn_layers, axis=0)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/tensor/manipulation.py", line 850, in split
      input=x, num_or_sections=num_or_sections, dim=axis, name=name)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 5029, in split 
      type='split', inputs=inputs, outputs={'Out': outs}, attrs=attrs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3184, in append_op
      attrs=kwargs.get("attrs", None))
    File "/home/huangyuxin/miniconda3/envs/py37/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2224, in __init__
      for frame in traceback.extract_stack():

    InvalidArgumentError: The split Op's Input Variable `X` contains uninitialized Tensor.
      [Hint: Expected t->IsInitialized() == true, but received t->IsInitialized():0 != true:1.] (at C:\home\workspace\Paddle_release\paddle\fluid\framework\operator.cc:2094)
      [operator < split > error]
yt605155624 commented 1 year ago

Duplicate of https://github.com/PaddlePaddle/PaddleSpeech/issues/2192

lichuanqi commented 1 year ago

Duplicate of #2192

好的,感谢,在 paddlespeech\s2t\exps\deepspeech2\bin\test_export.py 找到了