FeiYull / TensorRT-Alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
GNU General Public License v2.0
1.28k stars 198 forks source link

TX2 设备部署方法 已解决 我尝试在TX2设备上部署YOLOV8项目,但是TX2上的tensorrt,转换模型时会报告不支持int64的错误,有什么方法可以转换为int32呢? #15

Closed lyb36524 closed 1 year ago

lyb36524 commented 1 year ago

问题成功解决,感谢飞哥大力援助,以下是,解决方法: TX2 系统版本,jetpak4.6 关键步骤: 1.在PC or TX2导出静态onnx. 2.在TX2上,用TRT8.2编译onnx,得到trt文件. 注意: 然后编译tensorrt-alpha代码时所用的Tensorrt版本,要与trt转换时的一致。 关键命令: 1.在PC 或者 TX2导出静态onnx,注意这里与其他X86 ubuntu 上的转换命令不一致yolo mode=export model=yolov8n.pt format=onnx batch=1 2.将onnx文件拷贝到,TX2上,并在TX2上运行以下命令编译trt文件: ../../TensorRT-8.2.1.8/bin/trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly TX2的trtexec可执行文件目录在: /usr/src/tensorrt/bin 注意自行更改命令的目录 3.运行测试: ./app_yolov8 --model=../../data/yolov8/yolov8n.trt --size=640 --batch_size=1 --img=../../data/6406407.jpg --show


我尝试在TX2设备上部署YOLOV8项目,但是TX2上的tensorrt,转换模型时会报告不支持int64的错误,有什么方法可以转换为int32呢? 或者正确转换出模型呢? image `nvidia@ubuntu:~/TensorRT-Alpha-main/data/yolov8$ ./trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 &&&& RUNNING TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 [02/28/2023-01:55:01] [I] === Model Options === [02/28/2023-01:55:01] [I] Format: ONNX [02/28/2023-01:55:01] [I] Model: yolov8n.onnx [02/28/2023-01:55:01] [I] Output: [02/28/2023-01:55:01] [I] === Build Options === [02/28/2023-01:55:01] [I] Max batch: explicit batch [02/28/2023-01:55:01] [I] Workspace: 16 MiB [02/28/2023-01:55:01] [I] minTiming: 1 [02/28/2023-01:55:01] [I] avgTiming: 8 [02/28/2023-01:55:01] [I] Precision: FP32 [02/28/2023-01:55:01] [I] Calibration: [02/28/2023-01:55:01] [I] Refit: Disabled [02/28/2023-01:55:01] [I] Sparsity: Disabled [02/28/2023-01:55:01] [I] Safe mode: Disabled [02/28/2023-01:55:01] [I] DirectIO mode: Disabled [02/28/2023-01:55:01] [I] Restricted mode: Disabled [02/28/2023-01:55:01] [I] Save engine: yolov8n.trt [02/28/2023-01:55:01] [I] Load engine: [02/28/2023-01:55:01] [I] Profiling verbosity: 0 [02/28/2023-01:55:01] [I] Tactic sources: Using default tactic sources [02/28/2023-01:55:01] [I] timingCacheMode: local [02/28/2023-01:55:01] [I] timingCacheFile: [02/28/2023-01:55:01] [I] Input(s)s format: fp32:CHW [02/28/2023-01:55:01] [I] Output(s)s format: fp32:CHW [02/28/2023-01:55:01] [I] Input build shape: images=1x3x640x640+4x3x640x640+8x3x640x640 [02/28/2023-01:55:01] [I] Input calibration shapes: model [02/28/2023-01:55:01] [I] === System Options === [02/28/2023-01:55:01] [I] Device: 0 [02/28/2023-01:55:01] [I] DLACore: [02/28/2023-01:55:01] [I] Plugins: [02/28/2023-01:55:01] [I] === Inference Options === [02/28/2023-01:55:01] [I] Batch: Explicit [02/28/2023-01:55:01] [I] Input inference shape: images=4x3x640x640 [02/28/2023-01:55:01] [I] Iterations: 10 [02/28/2023-01:55:01] [I] Duration: 3s (+ 200ms warm up) [02/28/2023-01:55:01] [I] Sleep time: 0ms [02/28/2023-01:55:01] [I] Idle time: 0ms [02/28/2023-01:55:01] [I] Streams: 1 [02/28/2023-01:55:01] [I] ExposeDMA: Disabled [02/28/2023-01:55:01] [I] Data transfers: Enabled [02/28/2023-01:55:01] [I] Spin-wait: Disabled [02/28/2023-01:55:01] [I] Multithreading: Disabled [02/28/2023-01:55:01] [I] CUDA Graph: Disabled [02/28/2023-01:55:01] [I] Separate profiling: Disabled [02/28/2023-01:55:01] [I] Time Deserialize: Disabled [02/28/2023-01:55:01] [I] Time Refit: Disabled [02/28/2023-01:55:01] [I] Skip inference: Enabled [02/28/2023-01:55:01] [I] Inputs: [02/28/2023-01:55:01] [I] === Reporting Options === [02/28/2023-01:55:01] [I] Verbose: Disabled [02/28/2023-01:55:01] [I] Averages: 10 inferences [02/28/2023-01:55:01] [I] Percentile: 99 [02/28/2023-01:55:01] [I] Dump refittable layers:Disabled [02/28/2023-01:55:01] [I] Dump output: Disabled [02/28/2023-01:55:01] [I] Profile: Disabled [02/28/2023-01:55:01] [I] Export timing to JSON file: [02/28/2023-01:55:01] [I] Export output to JSON file: [02/28/2023-01:55:01] [I] Export profile to JSON file: [02/28/2023-01:55:01] [I] [02/28/2023-01:55:01] [I] === Device Information === [02/28/2023-01:55:01] [I] Selected Device: NVIDIA Tegra X2 [02/28/2023-01:55:01] [I] Compute Capability: 6.2 [02/28/2023-01:55:01] [I] SMs: 2 [02/28/2023-01:55:01] [I] Compute Clock Rate: 1.3 GHz [02/28/2023-01:55:01] [I] Device Global Memory: 7850 MiB [02/28/2023-01:55:01] [I] Shared Memory per SM: 64 KiB [02/28/2023-01:55:01] [I] Memory Bus Width: 128 bits (ECC disabled) [02/28/2023-01:55:01] [I] Memory Clock Rate: 1.3 GHz [02/28/2023-01:55:01] [I] [02/28/2023-01:55:01] [I] TensorRT version: 8.2.1 [02/28/2023-01:55:03] [I] [TRT] [MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 285, GPU 6703 (MiB) [02/28/2023-01:55:03] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 285 MiB, GPU 6704 MiB [02/28/2023-01:55:03] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 314 MiB, GPU 6732 MiB [02/28/2023-01:55:03] [I] Start parsing network model [02/28/2023-01:55:03] [I] [TRT] ---------------------------------------------------------------- [02/28/2023-01:55:03] [I] [TRT] Input filename: yolov8n.onnx [02/28/2023-01:55:03] [I] [TRT] ONNX IR version: 0.0.8 [02/28/2023-01:55:03] [I] [TRT] Opset version: 17 [02/28/2023-01:55:03] [I] [TRT] Producer name: pytorch [02/28/2023-01:55:03] [I] [TRT] Producer version: 1.13.1 [02/28/2023-01:55:03] [I] [TRT] Domain:
[02/28/2023-01:55:03] [I] [TRT] Model version: 0 [02/28/2023-01:55:03] [I] [TRT] Doc string:
[02/28/2023-01:55:03] [I] [TRT] ---------------------------------------------------------------- [02/28/2023-01:55:03] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:773: While parsing node number 239 [Range -> "/model.22/Range_output_0"]: [02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:774: --- Begin node --- [02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:775: input: "/model.22/Constant_8_output_0" input: "/model.22/Cast_output_0" input: "/model.22/Constant_9_output_0" output: "/model.22/Range_output_0" name: "/model.22/Range" op_type: "Range"

[02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:776: --- End node --- [02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3352 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!" [02/28/2023-01:55:04] [E] Failed to parse onnx file [02/28/2023-01:55:04] [I] Finish parsing network model [02/28/2023-01:55:04] [E] Parsing model failed [02/28/2023-01:55:04] [E] Failed to create engine from model. [02/28/2023-01:55:04] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8201] # ./trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 nvidia@ubuntu:~/TensorRT-Alpha-main/data/yolov8$ `

我还尝试了这个方法,但是没什么作用,依然有int64的节点: https://blog.csdn.net/dou3516/article/details/124577344

FeiYull commented 1 year ago

@lyb36524 For TensorRT8.4.2.4, there will not be this error.

lyb36524 commented 1 year ago

@lyb36524 For TensorRT8.4.2.4, there will not be this error.

对在x86上,使用TensorRT只会有警告,不会停止并且可以得到文件。但是在TX2上最高版本只有TensorRT8.2.1,完不成转换,我尝试在x86,转换得到模型,但是在TX2上推理会报错,引擎版本不一致,我尝试了好多TensorRT的版本,以及对应的8.2.1,依然有这个问题,所以我好像必须在TX2上完成转换?

FeiYull commented 1 year ago

@lyb36524 for Tensorrt8.2, you can reference the following step:

demo

lyb36524 commented 1 year ago

@lyb36524 Tensorrt8.2,您可以参考以下步骤:

  • 导出 ONNX:yolo mode=export model=yolov8n.pt format=onnx batch=1
  • 编译 onnx:../../TensorRT-8.2.1.8/bin/trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly
  • 这是编译成功消息: [02/28/2023-11:35:37] [I] [TRT] [内存用法更改] 引擎反序列化中的 TensorRT 管理分配:CPU +0、GPU +13,现在:CPU 0、GPU 13 (MiB) [02/28/2023-11:35:37] [I] 引擎在 128.87 秒内建成。 &&&& 通过了 TensorRT.trtexec [TensorRT v8201] # ../../TensorRT-8.2.1.8/bin/trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly
  • 推理:./app_yolov8 --model=../../data/yolov8/yolov8n_bs4.trt --size=640 --batch_size=1 --img=../../data/6406407.jpg --show

demo

  • 注意:如果要进行多批量推理,只需要修改上述批量参数,但在进行推理时可能需要修改yolov8.cpp,因为代码支持动态批量。

谢谢你的回复,但是我在X86平台上,使用不同的TensorRT,版本都可以,完成转换。 但是在TX2上,就会报错int64,有可能是ARM架构的原因。 然后我试着在X86平台上找到TX2平台上的TensorRT对应版本,完成转换,我把8.0的版本都试了一遍,可以完成转换。 但是在TX2上推理时都会出现类似的错误:

nvidia@ubuntu:~/TensorRT-Alpha-main/yolov8/build$ ./app_yolov8 --model=../../data/yolov8/yolov8n.trt --size=640 --batch_size=1 --img=../../data/6406407.jpg --show --savePath [02/28/2023-02:33:47] [I] model_path = ../../data/yolov8/yolov8n.trt [02/28/2023-02:33:47] [I] size = 640 [02/28/2023-02:33:47] [I] batch_size = 1 [02/28/2023-02:33:47] [I] image_path = ../../data/6406407.jpg [02/28/2023-02:33:47] [I] is_show = 1 [02/28/2023-02:33:47] [I] save_path = true [02/28/2023-02:33:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +261, GPU +0, now: CPU 302, GPU 6756 (MiB) [02/28/2023-02:33:48] [I] [TRT] Loaded engine size: 16 MiB [02/28/2023-02:33:48] [E] [TRT] 1: [stdArchiveReader.cpp::StdArchiveReader::40] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 205, Serialized Engine Version: 232) [02/28/2023-02:33:48] [E] [TRT] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.) [02/28/2023-02:33:48] [E] initEngine() ocur errors! runtime error /home/nvidia/TensorRT-Alpha-main/yolov8/yolov8.cpp:10 cudaFree(m_output_src_transpose_device) failed. code = cudaErrorInvalidValue, message = invalid argument nvidia@ubuntu:

所以,我在想是不是必须在TX2,上完成onnxx到trt的转换,所以有什么方法可以将,onnx转为INT32的格式?或者其他导出trt的方法?

FeiYull commented 1 year ago

@lyb36524 According to your error message, suggestions:

  1. Please compile onnx and run app_yolov8 on the same device
  2. The TensorRT version corresponding to compiling onnx must be consistent with the version linked by app_yolov8
lyb36524 commented 1 year ago

@lyb36524 According to your error message, suggestions:

  1. Please compile onnx and run app_yolov8 on the same device
  2. The TensorRT version corresponding to compiling onnx must be consistent with the version linked by app_yolov8

是的,但是我好像绕不开,TX2上的ARM平台的TensorRT8.2.1,如果我在其他平台编译了app_yolov8,在TX2上就依然无法运行。 所以,好像必须把ONNX int64 转换为 int32,再转换trt,但是我不知道有什么方法可以转换,或着其他,在TX2上转换模型的方法?

FeiYull commented 1 year ago

@lyb36524 Why do you copy trt files between different devices? This is forbidden by TensorRT! https://github.com/FeiYull/TensorRT-Alpha/issues/15#issuecomment-1447555138

lyb36524 commented 1 year ago

@lyb36524 Why do you copy trt files between different devices? This is forbidden by TensorRT! #15 (comment)

因为,我在TX2平台上无法,完成onnx到trt 文件 的转换,他会提示:

[02/28/2023-12:49:19] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3352 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"

我不知道还有什么方法,能完成 trt文件的转换,或者怎么修改ONNX文件,实现trt文件的转换?谢谢

FeiYull commented 1 year ago

@lyb36524

lyb36524 commented 1 year ago

@lyb36524

在你哪里,onnx拷贝到TX2可以完成转换?我这里ONNX拷贝到TX2,不能完成转换,会出现: [02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3352 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!" 就是第一张图片那样。 我的TX2使用的是,JetPack4.6.3,已经是TX2支持的最高版本了。

FeiYull commented 1 year ago

@lyb36524 上面指令试过?batch=1。另外,yolov8可能有个bug,batch=4导出onnx还是1

lyb36524 commented 1 year ago

@lyb36524 上面指令试过?batch=1。另外,yolov8可能有个bug,batch=4导出onnx还是1

试过,我一开始就是按照,这个步骤,在TX2上操作的: https://blog.csdn.net/m0_72734364/article/details/128758544?app_version=5.12.1&csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22128758544%22%2C%22source%22%3A%22m0_72734364%22%7D&utm_source=app

直到我遇到了这个问题: [02/28/2023-01:55:04] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3352 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!"

FeiYull commented 1 year ago

@lyb36524 不应该,有两个兄弟反馈在TX2跑通了,加企鹅732369616

lyb36524 commented 1 year ago

@lyb36524 不应该,有两个兄弟反馈在TX2跑通了,加企鹅732369616

太感谢了,我现在要出去一下,我先加您QQ

lyb36524 commented 1 year ago

@lyb36524 不应该,有两个兄弟反馈在TX2跑通了,加企鹅732369616

问题成功解决,感谢飞哥大力援助,以下是,解决方法: TX2 系统版本,jetpak4.6 关键步骤: 1.在PC or TX2导出静态onnx. 2.在TX2上,用TRT8.2编译onnx,得到trt文件. 注意: 然后编译tensorrt-alpha代码时所用的Tensorrt版本,要与trt转换时的一致。 关键命令: 1.在PC 或者 TX2导出静态onnx,注意这里与其他X86 ubuntu 上的转换命令不一致yolo mode=export model=yolov8n.pt format=onnx batch=1 2.将onnx文件拷贝到,TX2上,并在TX2上运行以下命令编译trt文件: ../../TensorRT-8.2.1.8/bin/trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly TX2的trtexec可执行文件目录在: /usr/src/tensorrt/bin 注意自行更改命令的目录 3.运行测试: ./app_yolov8 --model=../../data/yolov8/yolov8n.trt --size=640 --batch_size=1 --img=../../data/6406407.jpg --show

FeiYull commented 1 year ago

@lyb36524 不应该,有两个兄弟反馈在TX2跑通了,加企鹅732369616

问题成功解决,感谢飞哥大力援助,以下是,解决方法: TX2 系统版本,jetpak4.6 关键步骤: 1.在PC or TX2导出静态onnx. 2.在TX2上,用TRT8.2编译onnx,得到trt文件. 注意: 然后编译tensorrt-alpha代码时所用的Tensorrt版本,要与trt转换时的一致。 关键命令: 1.在PC 或者 TX2导出静态onnx,注意这里与其他X86 ubuntu 上的转换命令不一致yolo mode=export model=yolov8n.pt format=onnx batch=1 2.将onnx文件拷贝到,TX2上,并在TX2上运行以下命令编译trt文件: ../../TensorRT-8.2.1.8/bin/trtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly TX2的trtexec可执行文件目录在: /usr/src/tensorrt/bin 注意自行更改命令的目录 3.运行测试: ./app_yolov8 --model=../../data/yolov8/yolov8n.trt --size=640 --batch_size=1 --img=../../data/6406407.jpg --show

除此之外,TX2设备拷贝数据需要简单修改下,参考:https://github.com/FeiYull/TensorRT-Alpha/issues/16#issue-1603238240

Phoenix8215 commented 10 months ago

把作者逼的最后说中文了😄,感谢,我也碰到类似问题

FeiYull commented 10 months ago

@Phoenix8215 可以直接提新的issue

FeiYull commented 10 months ago

@Phoenix8215 如图在yolo.cpp中,将0改为1,最下方代码注释。

捕获