PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.01k stars 5.53k forks source link

模型推理报错,请教问题,ValueError: (InvalidArgument) Deserialize to tensor failed #66047

Closed lovychen closed 1 month ago

lovychen commented 1 month ago

模型推理报错,请教出现错误原因

我用paddle版本,paddlepaddle-gpu 2.6.1 通过torch2paddle,转换得到模型文件目录:

├── inference_model │   ├── model.pdiparams │   ├── model.pdiparams.info │   └── model.pdmodel ├── layer_model │   ├── __ model__ -> ../inference_model/model.pdmodel │   ├── __params__ -> ../inference_model/model.pdiparams │   ├── sentence_transformer_0.w_0 │   └── sentence_transformer_0.w_1 ├── model.pdparams ├── pycache │   └── x2paddle_code.cpython-311.pyc └── x2paddle_code.py

目前:采用这种方式进行推理,结果正确:

import argparse
import numpy as np
from transformers import AutoTokenizer
#引用 paddle inference 预测库
import paddle.inference as paddle_infer
def get_embedding():
    loaded_vector = np.load('query_doc_test.npy')
    return loaded_vector[0],loaded_vector[1]
def main():
    args = parse_args()
    # 创建 config
    config = paddle_infer.Config(args.model_file, args.params_file)
    #根据 config 创建 predictor
    predictor = paddle_infer.create_predictor(config)
    # 获取输入的名称
    input_names = predictor.get_input_names()
    input_handle_query = predictor.get_input_handle(input_names[0])
    input_handle_doc = predictor.get_input_handle(input_names[1])
    print("input_names",input_names)
    #获取输入向量
    fake_query_emb,fake_doc_emb = get_embedding()
    # 设置输入
    fake_input1 = np.random.randn(args.batch_size,768).astype("float32")
    fake_input2 = np.random.randn(args.batch_size,768).astype("float32")
    input_handle_query.reshape([args.batch_size,768])
    input_handle_doc.reshape([args.batch_size,768])
    #input_handle_query.copy_from_cpu(fake_input1)
    #input_handle_doc.copy_from_cpu(fake_input2)
    input_handle_query.copy_from_cpu(fake_query_emb)
    input_handle_doc.copy_from_cpu(fake_doc_emb)
    # 运行predictor
    predictor.run()
    # 获取输出
    output_names = predictor.get_output_names()
    output_handle = predictor.get_output_handle(output_names[0])
    output_data = output_handle.copy_to_cpu() # numpy.ndarray类型
    print("Output data is {}".format(output_data))
    print("Output data size is {}".format(output_data.size))
    print("Output data shape is {}".format(output_data.shape))
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_file", type=str, help="model filename")
    parser.add_argument("--params_file", type=str, help="parameter filename")
    parser.add_argument("--batch_size", type=int, default=1, help="batch size")
    return parser.parse_args()
if __name__ == "__main__":
    main()

但是线上目前是老的推理服务:需要把 model.pdiparams 拆分 为层次结构,也就是这种结构: │   ├── __model__ -> ../inference_model/model.pdmodel (由原始软链过来) │   ├── sentence_transformer_0.w_0 │   └── sentence_transformer_0.w_1 目前拆分后通过这个函数导入的过程中: 代码:会自动寻找model,然后根据层次结构文件,匹配文件名,读取向量

import paddle
import sys
#import paddle.fluild
import paddle.inference as paddle_infer`
model_file_path = "./layer_model"
config = paddle_infer.Config(model_file_path)

报错:

File "paddle_print.py", line 46, in <module>
    predictor = paddle_infer.create_predictor(config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: (InvalidArgument) Deserialize to tensor failed, maybe the loaded file is not a paddle model(expected file format: 0, but 2929001600 found).
  [Hint: Expected version == 0U, but received version:2929001600 != 0U:0.] (at /paddle/paddle/fluid/framework/lod_tensor.cc:301)
  [operator < load > error]

另一种方式:work,没有异常

import paddle
import sys
#import paddle.fluild
import paddle.inference as paddle_infer`
model_file = "./layer_model/__model__"
params_file=".//layer_model/__params__"
config = paddle_infer.Config(model_file, params_file)

补充拆分的代码:

import paddle
from transformers import BertTokenizer, BertModel
model_path_main = "./pd_model_0710_with_softmax_nn/"
model = paddle.load(model_path_main + "/inference_model/model.pdiparams.info")
param_dict = {}
param_dict_revert = {}
param_name_list = []
for each in model: #读取模型对应的向量名称;
     #print(each,model[each]["structured_name"])
     param_dict[each] = model[each]["structured_name"]
     param_dict_revert[model[each]["structured_name"]] = each
     param_name_list.append(each)
#print(model)

print(len(param_name_list))
run_pdparams = True
if(run_pdparams): #存储到对应的名称中;
    model_2 = paddle.load(model_path_main + "/model.pdparams")
    value_all = []
    for each in model_2:
        new_dict = {}
        new_name = param_dict_revert[each]
        print(each," -> ",new_name)
        value  = model_2[each].numpy()
        new_dict[new_name] = value
        value_all.append(value)
        paddle.save(new_dict,model_path_main + "/layer_model/"+new_name)

问题是: 1、老的版本paddle,所存储的 model 是否等同于新版本的 model.pdmodel,是否能通过从命名进行使用; 2、这样的拆分是否正确呢,试了多次拆分,转numpy or不转,存储成dict,都不work,报的同样的错误; 3、该错误是由于 model 版本的不一致造成的呢,还是由于拆分的向量不对造成的呢;

vivienfanghuagood commented 1 month ago

你好,现在仅推荐这种方式推理: import paddle import sys

import paddle.fluild

import paddle.inference as paddle_infer` model_file = "./layer_model/model" params_file=".//layer_model/params" config = paddle_infer.Config(model_file, params_file)

lovychen commented 1 month ago

谢谢回复 ; 当前问题已解决,后续升级为您说的推理方式:

根据报错再阅读了官方文档发现,当前转换的参数的二进制格式和推理所需要的二进制不符合

configs (dict,可选) – 其他配置选项,目前支持以下选项:(1)use_binary_format(bool)- 如果被保存的对象是静态图的 Tensor,你可以指定这个参数。如果被指定为 True,这个 Tensor 会被保存为由 paddle 定义的二进制格式的文件;否则这个 Tensor 被保存为 pickle 格式。默认为 False 。

原始拆分代码:

new_dict[new_name] = value
value_all.append(value)
paddle.save(new_dict,model_path_main + "/layer_model/"+new_name)

修改后的代码:不需要转为numpy,直接保存tensor,同时增加配置选项use_binary_format=True,问题解决:

paddle.save(model_2[each] ,model_path_main + "/layer_model/"+new_name,use_binary_format=True)