FeiYull / TensorRT-Alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
GNU General Public License v2.0
1.3k stars 201 forks source link

用trtexec工具转化onnx为tensorrt引擎报错 #69

Open feiyibandeganjue opened 9 months ago

feiyibandeganjue commented 9 months ago

按照readme给出的这两个步骤:

  1. yolo mode=export model=/cv/model/test/yolov8s.pt format=onnx dynamic=True 2../trtexec --onnx=/cv/model/test/yolov8s_80.onnx --saveEngine=/cv/model/test/yolov8s.engine --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:2x3x640x640 --maxShapes=images:4x3x640x640 在运行第二步的时候报错: [01/05/2024-14:42:23] [E] [TRT] ModelImporter.cpp:769: While parsing node number 258 [Range -> "/model.22/Range_output_0"]: [01/05/2024-14:42:23] [E] [TRT] ModelImporter.cpp:770: --- Begin node --- [01/05/2024-14:42:23] [E] [TRT] ModelImporter.cpp:771: input: "/model.22/Constant_14_output_0" input: "/model.22/Cast_output_0" input: "/model.22/Constant_15_output_0" output: "/model.22/Range_output_0" name: "/model.22/Range" op_type: "Range"

[01/05/2024-14:42:23] [E] [TRT] ModelImporter.cpp:772: --- End node --- [01/05/2024-14:42:23] [E] [TRT] ModelImporter.cpp:775: ERROR: builtin_op_importers.cpp:3347 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!" [01/05/2024-14:42:23] [E] Failed to parse onnx file [01/05/2024-14:42:23] [I] Finish parsing network model [01/05/2024-14:42:23] [E] Parsing model failed [01/05/2024-14:42:23] [E] Failed to create engine from model. [01/05/2024-14:42:23] [E] Engine set up failed

FeiYull commented 9 months ago

@feiyibandeganjue 显卡3060嘛?

feiyibandeganjue commented 9 months ago

对,显卡是3060的

FeiYull commented 9 months ago

@feiyibandeganjue 3060比较特殊,不要用trtexec 编译onnx,应该使用代码编译。

feiyibandeganjue commented 9 months ago

也就是说要重新写一段编译tensorrt引擎的代码是吗?在本工程有实现吗?

FeiYull commented 9 months ago

没有很多代码量,30行?找时间更新下

FeiYull @.***

------------------ 原始邮件 ------------------ 发件人: feiyibandeganjue @.> 发送时间: 2024年1月6日 11:10 收件人: FeiYull/TensorRT-Alpha @.> 抄送: FeiYull @.>, Comment @.> 主题: Re: [FeiYull/TensorRT-Alpha] 用trtexec工具转化onnx为tensorrt引擎报错 (Issue #69)

Sencc commented 9 months ago

使用3060显卡,同样的错误

FeiYull commented 9 months ago

@feiyibandeganjue @Sencc

官方似乎无解:https://github.com/NVIDIA/TensorRT/issues/3576

解决方案:源码编译onnx,https://github.com/FeiYull/TensorRT-Alpha/blob/main/tools/onnx2trt.cpp

scteam1994 commented 8 months ago

这是不是就是杜佬要修改view()函数,并且取消dynamic axis的原因?

scteam1994 commented 8 months ago

搞了一天,按杜佬的思路改成了,我是用的posemodel,如果只用detect的话只改tal.py就行了。 首先是site-packages\ultralytics\utils\tal.py makeanchors函数: , _, h, w = feats[i].shape sx = torch.arange(end=w, device=device, dtype=dtype) + grid_celloffset # shift x 改为 , _, h, w = feats[i].shape h, w = int(h), int(w) sx = torch.arange(end=w, device=device, dtype=dtype) + grid_celloffset # shift x 然后是ultralytics\nn\modules\head.py else: y = kpts.clone() if ndim == 3: y[:, 2::3] = y[:, 2::3].sigmoid() # sigmoid (WARNING: inplace .sigmoid() Apple MPS bug) y[:, 0::ndim] = (y[:, 0::ndim] 2.0 + (self.anchors[0] - 0.5)) self.strides y[:, 1::ndim] = (y[:, 1::ndim] 2.0 + (self.anchors[1] - 0.5)) self.strides return y 该成和if分支一样: else: y = y.view(bs, self.kpt_shape, -1) a = (y[:, :, :2] 2.0 + (self.anchors - 0.5)) * self.strides if ndim == 3: a = torch.cat((a, y[:, :, 2:3].sigmoid()), 2) return a.view(bs, nk, -1) 还有一个dynamic定义,在export方法里,我找不到了,把dynamic的axis只包括batch,W和H删了。 转trt成功,测试代码我还没写,但是应该不会影响推理结果,这样改的话就只能和杜老repo里一样,如果要修改输入图片大小只能用hooker重新编译。 如果使用ultralytics pip库的话,需要修改export代码为 from ultralytics import YOLO model = YOLO('yolov8n-pose.yaml').load('yolov8n-pose.pt') model.export(format='onnx', opset=12, dynamic=True,imgsz=(xxx, xxx))

scteam1994 commented 8 months ago

首先感谢作者提供的这么多decode kernel,还有这整个推理流程,按照我自己微弱的cuda编程能力,我感觉有两点可以提升一下推理速度, 1.YOLOv8Pose::preprocess 里面4个核函数跑起来不如合成一个核函数,比如rgb2bgr,hwc2chw可以改成device function 可以减少一些io,增加一些命中率。 2.cuda stream应用 首先声明cuda stream我目前没玩明白,核函数调用输入空指针<<<grid, block,size,nullptr>>>是不是就等效于<<<grid, block,size>>>,如果是的话目前这个repo还没有用到stream,我只是粗浅的了解了一下stream的作用,按道理应该是能提升多任务并行的推理速度的。

FeiYull commented 8 months ago

@scteam1994 可以改,只是为了降低使用门槛

feiyibandeganjue commented 8 months ago

不仅在3060上报错,在2070的卡同样报错,但用4060的显卡成功了,请问下作者onnx转tensorrt的dynamic,最少是什么显卡支持?难道正好卡在3060?3060及以下不支持,以上就支持?

Bigfishering commented 6 months ago

i changed TensorRT from 8.4.2 to 8.4.3, and solved it.

zhangzhenyu-pony commented 6 months ago

i changed TensorRT from 8.4.2 to 8.4.3, and solved it.

请问,在jetson上如何更改tensorrt的版本?我的是8.2的,在进行trt格式转换时,也是报错。

zhangzhenyu-pony commented 6 months ago

u> @feiyibandeganjue @Sencc

官方似乎无解:NVIDIA/TensorRT#3576

解决方案:源码编译onnx,https://github.com/FeiYull/TensorRT-Alpha/blob/main/tools/onnx2trt.cpp 请问,

请问,如何执行源码编译?我在jetson Xavier nx上执行时,还是会报错,大概就是缺少包依赖,我的tensorrt版本是8.2

QHQ-cloud commented 1 month ago

我采用了一种办法,首先修改为仅动态batch,在ultralytics/engine/exporter.py大概400行中:

output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"]

    # dynamic = self.args.dynamic
    # if dynamic:
    #     dynamic = {"images": {0: "batch", 2: "height", 3: "width"}}  # shape(1,3,640,640)
    #     if isinstance(self.model, SegmentationModel):
    #         dynamic["output0"] = {0: "batch", 2: "anchors"}  # shape(1, 116, 8400)
    #         dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"}  # shape(1,32,160,160)
    #     elif isinstance(self.model, DetectionModel):
    #         dynamic["output0"] = {0: "batch", 2: "anchors"}  # shape(1, 84, 8400)
    output_names = ['output0', 'output1'] if isinstance(self.model, SegmentationModel) else ['output0']
    dynamic = self.args.dynamic
    if dynamic:
        dynamic = {'images': {0: 'batch'}}  # shape(1,3,640,640)
        if isinstance(self.model, SegmentationModel):
            dynamic['output0'] = {0: 'batch'}  # shape(1, 116, 8400)
            dynamic['output1'] = {0: 'batch'}  # shape(1,32,160,160)
        elif isinstance(self.model, DetectionModel):
            dynamic['output0'] = {0: 'batch', 2: 'anchors'}  # shape(1, 84, 8400)

然后,代码导出: model = YOLO(r"D:\PythonCode\ultralytics-main\ultralytics-main\ultralytics\cfg\models\v8\yolov8-seg.yaml") model = YOLO(r"D:\PythonCode\ultralytics-main\ultralytics-main\runs\segment\train\weights\best.pt") model.export(format='onnx', dynamic=True, opset=12) 然后:打开anaconda powershell promot , 切换环境,cd 到上述 onnx 所在文件夹。执行:python -m onnxsim 当前onnx 简化onnx,注意观察有没有range等不支持算子, 这一步注意观察,是否有Range变为0.如果是0,则导出正确。 然后,我使用了博主所说的8.4.2.4 tensorRT,使用代码的方式导出trt 我把//config->setFlag(nvinfer1::BuilderFlag::kFP16);这个注释了。 我试了 官方的n模型和s模型,onnx 转成的trt 都有效的。