PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.27k stars 5.6k forks source link

paddle2.5.2之后的版本导出的模型转换onnx模型,onnxruntime加载时报错 #67710

Open David-dotcom666 opened 2 months ago

David-dotcom666 commented 2 months ago

bug描述 Describe the Bug

使用ppdet预训练模型rtdetr_r50vd_6x_coco.pdparams或其他模型,直接转换onnx模型,paddle2onnx转换. 预测代码: def get_test_images(infer_dir): """ Get image path list in TEST mode """ assert infer_dir is None or os.path.isdir(infer_dir), \ "{} is not a directory".format(infer_dir)

# infer_img has a higher priority

images = set()
infer_dir = os.path.abspath(infer_dir)
assert os.path.isdir(infer_dir), \
    "infer_dir {} is not a directory".format(infer_dir)
exts = ['jpg', 'jpeg', 'png', 'bmp']
exts += [ext.upper() for ext in exts]
for ext in exts:
    images.update(glob.glob('{}/*.{}'.format(infer_dir, ext)))
images = list(images)

assert len(images) > 0, "no image found in {}".format(infer_dir)
print("Found {} inference images in total.".format(len(images)))

return images

class PredictConfig(object): """set config of preprocess, postprocess and visualize Args: infer_config (str): path of infer_cfg.yml """

def __init__(self, infer_config):
    # parsing Yaml config for Preprocess

    d1 = read_enfile('infer_cfg.yml')
    yml_conf = yaml.safe_load(d1)

    # with open(infer_config) as f:
        # yml_conf = yaml.safe_load(f)
    self.check_model(yml_conf)
    self.arch = yml_conf['arch']
    self.preprocess_infos = yml_conf['Preprocess']
    self.min_subgraph_size = yml_conf['min_subgraph_size']
    self.label_list = yml_conf['label_list']
    self.use_dynamic_shape = yml_conf['use_dynamic_shape']
    self.draw_threshold = yml_conf.get("draw_threshold", 0.5)
    self.mask = yml_conf.get("mask", False)
    self.tracker = yml_conf.get("tracker", None)
    self.nms = yml_conf.get("NMS", None)
    self.fpn_stride = yml_conf.get("fpn_stride", None)
    if self.arch == 'RCNN' and yml_conf.get('export_onnx', False):
        print(
            'The RCNN export model is used for ONNX and it only supports batch_size = 1'
        )
    self.print_config()

def check_model(self, yml_conf):
    """
    Raises:
        ValueError: loaded model not in supported model type
    """
    for support_model in SUPPORT_MODELS:
        if support_model in yml_conf['arch']:
            return True
    raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[
        'arch'], SUPPORT_MODELS))

def print_config(self):
    print('-----------  Model Configuration -----------')
    print('%s: %s' % ('Model Arch', self.arch))
    print('%s: ' % ('Transform Order'))
    for op_info in self.preprocess_infos:
        print('--%s: %s' % ('transform op', op_info['type']))
    print('--------------------------------------------')

def predict_image(infer_config, predictor, img_list):

load preprocess transforms

transforms = Compose(infer_config.preprocess_infos)
# predict image
for img_path in img_list:
    start_time = time.perf_counter()

    inputs = transforms(img_path)
    inputs_name = [var.name for var in predictor.get_inputs()]
    inputs = {k: inputs[k][None, ] for k in inputs_name}

    outputs = predictor.run(output_names=None, input_feed=inputs)

    print("ONNXRuntime predict: ")
    if infer_config.arch in ["HRNet"]:
        print(np.array(outputs[0]))
    else:
        bboxes = np.array(outputs[0])
        for bbox in bboxes:
            if bbox[0] > -1 and bbox[1] > infer_config.draw_threshold:
                print(f"{int(bbox[0])} {bbox[1]} "
                      f"{bbox[2]} {bbox[3]} {bbox[4]} {bbox[5]}")
    end_time = time.perf_counter()
    elapsed_time = round((end_time - start_time) * 1000, 4)
    print(f"cost: {elapsed_time} ms")

def read_enfile(file_path,en=True): if en: with open(file_path, 'rb') as f:

key_file_path = os.path.join('C:\','unique.key')

    #     loaded_key = load_unique_key(key_file_path)
        cpu_serial = get_cpu_serial_windows()
        unique_key = generate_unique_key(cpu_serial)
        encrypted_data = f.read()
        decrypted_data = decrypt_data(encrypted_data, unique_key)
        return decrypted_data
else:
    with open(file_path, 'rb') as f:
        encrypted_data = f.read()
        return encrypted_data

def findImageFiles(folderPath,tarform =None): if tarform: extensions = [f'.{tarform}'] else: extensions = [ ".%s" % fmt.data().decode().lower() for fmt in QtGui.QImageReader.supportedImageFormats() ]

images = []
for file in os.listdir(folderPath):
    file_path = os.path.join(folderPath, file)
    if os.path.isfile(file_path) and file.lower().endswith(tuple(extensions)):
        images.append(file_path)
images = natsort.os_sorted(images)
return images

if name == 'main':

FLAGS = parser.parse_args()
# load image list
img_list = get_test_images('C:\\Users\\Metr\\Desktop\\字符分类\\imgdatas')
# img_list = get_test_images("./", 'bus.jpg')
# load predictor

sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
# load infer config
predictor = InferenceSession('15.onnx',sess_options=sess_options,providers=['CUDAExecutionProvider'])

infer_config = PredictConfig('infer_cfg.yml')

# start_time = time.perf_counter()
predict_image(infer_config, predictor, img_list)
# end_time = time.perf_counter()
# elapsed_time = round((end_time - start_time)*1000, 4)
# print(f"预测耗时: {elapsed_time:.6f} 秒")

会出现 onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Tile node. Name:'p2o.Tile.3' Status Message: the tensor to be tiled using Tile OP must be atleast 1 dimensional

我在这里也提了,但是我测试的结果是paddle2.4.2导出后的模型转换onnx可以正常,paddle2.5.2,paddle2.6.1都不可以 [https://github.com/lyuwenyu/RT-DETR/issues/428]

其他补充信息 Additional Supplementary Information

No response

Zheng-Bicheng commented 2 months ago

@David-dotcom666 Hello,您用的Paddle2ONNX版本号是多少呢?只替换Paddle不替换Paddle2ONNX会出现这个问题吗?

David-dotcom666 commented 2 months ago

@Zheng-Bicheng p2onnx 1.0.5,只替换paddle就可以,目前只有paddle2.4.2可以.2.5.2/2.6.1/3.0(paddle2onnx不支持3.0)都不可以,但是paddle2.4.2训练吃的显存明显高于3.0太多,不利于工业生产环境。

David-dotcom666 commented 2 months ago

有什么回复吗?

Zheng-Bicheng commented 2 months ago

有什么回复吗?

已经在修复了,但是还没排查到原因

Zheng-Bicheng commented 2 months ago

Paddle2ONNX我们升级到1.2.x了,您更新下试试呢?

David-dotcom666 commented 2 months ago

@Zheng-Bicheng p2onnx 1.2.X需要paddle 2.6,我们的环境目前配的2.4.2的...又涉及到其他的一些库版本,暂时是试不了paddle2.6的了。 目前试用paddle3.0比2.X好挺多的,速度快了显存吃的也小很多,代码结构也比原来的简洁,以前改导出模型部分IO要改几个地方,现在我只要改一个地方。本来想这次直接用3.0的,结果后面发现p2onnx还没配paddle3 之前部署原生推理的话就直接用了,奈何要转trt部署只能希望p2onnx等其他框架尽快适配paddle3.0了

Zheng-Bicheng commented 2 months ago

@Zheng-Bicheng p2onnx 1.2.X需要paddle 2.6,我们的环境目前配的2.4.2的...又涉及到其他的一些库版本,暂时是试不了paddle2.6的了。 目前试用paddle3.0比2.X好挺多的,速度快了显存吃的也小很多,代码结构也比原来的简洁,以前改导出模型部分IO要改几个地方,现在我只要改一个地方。本来想这次直接用3.0的,结果后面发现p2onnx还没配paddle3 之前部署原生推理的话就直接用了,奈何要转trt部署只能希望p2onnx等其他框架尽快适配paddle3.0了

3.0用起来有什么问题呢?其实理论上是可以用的,只是还没有充分测试

David-dotcom666 commented 2 months ago

@Zheng-Bicheng 要修改吧,我只看了p2onnx中会引入paddle的一些包,引入的方式还是2.X的结构的,印象中好像是fuild io会报错。接口应该也有些不一样吧.ppdet,ppseg我试过的的几个模型都是可以直接训练导出预测.

Zheng-Bicheng commented 2 months ago

@Zheng-Bicheng 要修改吧,我只看了p2onnx中会引入paddle的一些包,引入的方式还是2.X的结构的,印象中好像是fuild io会报错。接口应该也有些不一样吧.ppdet,ppseg我试过的的几个模型都是可以直接训练导出预测.

应该不会,你再测试下,有问题到 Paddle2ONNX 提 Issues 吧,我来帮你处理,我这边测了几个模型都没啥问题。

David-dotcom666 commented 2 months ago

@Zheng-Bicheng 要修改吧,我只看了p2onnx中会引入paddle的一些包,引入的方式还是2.X的结构的,印象中好像是fuild io会报错。接口应该也有些不一样吧.ppdet,ppseg我试过的的几个模型都是可以直接训练导出预测.

应该不会,你再测试下,有问题到 Paddle2ONNX 提 Issues 吧,我来帮你处理,我这边测了几个模型都没啥问题。

请问下你现在测试的模型是paddle3.0的吧?p2onnx是哪个版本的。后面优化还是想用3.0

Zheng-Bicheng commented 2 months ago

请问下你现在测试的模型是paddle3.0的吧?p2onnx是哪个版本的。后面优化还是想用3.0

最新的 P2O 版本是 1.2.8