构建 CLRNet 的 TensorRT 文件时报错

CrazyMustard-404 commented 1 year ago

已成功生成llamas_dla34_tmp.onnx文件，但在构建engine文件时出错，报错如下： [03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:776: --- End node --- [03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph: [6] Invalid Node - Pad_237 [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensi ons: dimensions were [-1,2]) [03/29/2023-16:55:56] [E] Failed to parse onnx file [03/29/2023-16:55:56] [I] Finish parsing network model [03/29/2023-16:55:56] [E] Parsing model failed [03/29/2023-16:55:56] [E] Failed to create engine from model or file. [03/29/2023-16:55:56] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine

Yutong-gannis commented 1 year ago

这是所有报错？

CrazyMustard-404 commented 1 year ago

这是所有报错？

抱歉，只贴了一部分，以下为完整输出：

&&&& RUNNING TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine [03/29/2023-16:55:55] [I] === Model Options === [03/29/2023-16:55:55] [I] Format: ONNX [03/29/2023-16:55:55] [I] Model: ./engines/llamas_dla34_tmp.onnx [03/29/2023-16:55:55] [I] Output: [03/29/2023-16:55:55] [I] === Build Options === [03/29/2023-16:55:55] [I] Max batch: explicit batch [03/29/2023-16:55:55] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [03/29/2023-16:55:55] [I] minTiming: 1 [03/29/2023-16:55:55] [I] avgTiming: 8 [03/29/2023-16:55:55] [I] Precision: FP32 [03/29/2023-16:55:55] [I] LayerPrecisions: [03/29/2023-16:55:55] [I] Calibration: [03/29/2023-16:55:55] [I] Refit: Disabled [03/29/2023-16:55:55] [I] Sparsity: Disabled [03/29/2023-16:55:55] [I] Safe mode: Disabled [03/29/2023-16:55:55] [I] DirectIO mode: Disabled [03/29/2023-16:55:55] [I] Restricted mode: Disabled [03/29/2023-16:55:55] [I] Build only: Disabled [03/29/2023-16:55:55] [I] Save engine: ./engines/llamas_dla34.engine [03/29/2023-16:55:55] [I] Load engine: [03/29/2023-16:55:55] [I] Profiling verbosity: 0 [03/29/2023-16:55:55] [I] Tactic sources: Using default tactic sources [03/29/2023-16:55:55] [I] timingCacheMode: local [03/29/2023-16:55:55] [I] timingCacheFile: [03/29/2023-16:55:55] [I] Input(s)s format: fp32:CHW [03/29/2023-16:55:55] [I] Output(s)s format: fp32:CHW [03/29/2023-16:55:55] [I] Input build shapes: model [03/29/2023-16:55:55] [I] Input calibration shapes: model [03/29/2023-16:55:55] [I] === System Options === [03/29/2023-16:55:55] [I] Device: 0 [03/29/2023-16:55:55] [I] DLACore: [03/29/2023-16:55:55] [I] Plugins: [03/29/2023-16:55:55] [I] === Inference Options === [03/29/2023-16:55:55] [I] Batch: Explicit [03/29/2023-16:55:55] [I] Input inference shapes: model [03/29/2023-16:55:55] [I] Iterations: 10 [03/29/2023-16:55:55] [I] Duration: 3s (+ 200ms warm up) [03/29/2023-16:55:55] [I] Sleep time: 0ms [03/29/2023-16:55:55] [I] Idle time: 0ms [03/29/2023-16:55:55] [I] Streams: 1 [03/29/2023-16:55:55] [I] ExposeDMA: Disabled [03/29/2023-16:55:55] [I] Data transfers: Enabled [03/29/2023-16:55:55] [I] Spin-wait: Disabled [03/29/2023-16:55:55] [I] Multithreading: Disabled [03/29/2023-16:55:55] [I] CUDA Graph: Disabled [03/29/2023-16:55:55] [I] Separate profiling: Disabled [03/29/2023-16:55:55] [I] Time Deserialize: Disabled [03/29/2023-16:55:55] [I] Time Refit: Disabled [03/29/2023-16:55:55] [I] Inputs: [03/29/2023-16:55:55] [I] === Reporting Options === [03/29/2023-16:55:55] [I] Verbose: Disabled [03/29/2023-16:55:55] [I] Averages: 10 inferences [03/29/2023-16:55:55] [I] Percentile: 99 [03/29/2023-16:55:55] [I] Dump refittable layers:Disabled [03/29/2023-16:55:55] [I] Dump output: Disabled [03/29/2023-16:55:55] [I] Profile: Disabled [03/29/2023-16:55:55] [I] Export timing to JSON file: [03/29/2023-16:55:55] [I] Export output to JSON file: [03/29/2023-16:55:55] [I] Export profile to JSON file: [03/29/2023-16:55:55] [I] [03/29/2023-16:55:55] [I] === Device Information === [03/29/2023-16:55:55] [I] Selected Device: NVIDIA GeForce RTX 3090 [03/29/2023-16:55:55] [I] Compute Capability: 8.6 [03/29/2023-16:55:55] [I] SMs: 82 [03/29/2023-16:55:55] [I] Compute Clock Rate: 1.785 GHz [03/29/2023-16:55:55] [I] Device Global Memory: 24575 MiB [03/29/2023-16:55:55] [I] Shared Memory per SM: 100 KiB [03/29/2023-16:55:55] [I] Memory Bus Width: 384 bits (ECC disabled) [03/29/2023-16:55:55] [I] Memory Clock Rate: 9.751 GHz [03/29/2023-16:55:55] [I] [03/29/2023-16:55:55] [I] TensorRT version: 8.4.2 [03/29/2023-16:55:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +492, GPU +0, now: CPU 7429, GPU 1441 (MiB) [03/29/2023-16:55:56] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +365, GPU +104, now: CPU 7984, GPU 1545 (MiB) [03/29/2023-16:55:56] [I] Start parsing network model [03/29/2023-16:55:56] [I] [TRT] ---------------------------------------------------------------- [03/29/2023-16:55:56] [I] [TRT] Input filename: ./engines/llamas_dla34_tmp.onnx [03/29/2023-16:55:56] [I] [TRT] ONNX IR version: 0.0.6 [03/29/2023-16:55:56] [I] [TRT] Opset version: 11 [03/29/2023-16:55:56] [I] [TRT] Producer name: pytorch [03/29/2023-16:55:56] [I] [TRT] Producer version: 1.9 [03/29/2023-16:55:56] [I] [TRT] Domain: [03/29/2023-16:55:56] [I] [TRT] Model version: 0 [03/29/2023-16:55:56] [I] [TRT] Doc string: [03/29/2023-16:55:56] [I] [TRT] ---------------------------------------------------------------- [03/29/2023-16:55:56] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [03/29/2023-16:55:56] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped [03/29/2023-16:55:56] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped [03/29/2023-16:55:56] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape ten sor must have 0 or 1 reshape dimensions: dimensions were [-1,2]) [03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:773: While parsing node number 237 [Pad -> "496"]: [03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:774: --- Begin node --- [03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:775: input: "313" input: "494" input: "495" output: "496" name: "Pad_237" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING }

[03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:776: --- End node --- [03/29/2023-16:55:56] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph: [6] Invalid Node - Pad_237 [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensi ons: dimensions were [-1,2]) [03/29/2023-16:55:56] [E] Failed to parse onnx file [03/29/2023-16:55:56] [I] Finish parsing network model [03/29/2023-16:55:56] [E] Parsing model failed [03/29/2023-16:55:56] [E] Failed to create engine from model or file. [03/29/2023-16:55:56] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine

(ADAS) D:\Project\Self-driving-Truck-in-Euro-Truck-Simulator2-main>trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine &&&& RUNNING TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine [03/29/2023-17:17:43] [I] === Model Options === [03/29/2023-17:17:43] [I] Format: ONNX [03/29/2023-17:17:43] [I] Model: ./engines/llamas_dla34_tmp.onnx [03/29/2023-17:17:43] [I] Output: [03/29/2023-17:17:43] [I] === Build Options === [03/29/2023-17:17:43] [I] Max batch: explicit batch [03/29/2023-17:17:43] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [03/29/2023-17:17:43] [I] minTiming: 1 [03/29/2023-17:17:43] [I] avgTiming: 8 [03/29/2023-17:17:43] [I] Precision: FP32 [03/29/2023-17:17:43] [I] LayerPrecisions: [03/29/2023-17:17:43] [I] Calibration: [03/29/2023-17:17:43] [I] Refit: Disabled [03/29/2023-17:17:43] [I] Sparsity: Disabled [03/29/2023-17:17:43] [I] Safe mode: Disabled [03/29/2023-17:17:43] [I] DirectIO mode: Disabled [03/29/2023-17:17:43] [I] Restricted mode: Disabled [03/29/2023-17:17:43] [I] Build only: Disabled [03/29/2023-17:17:43] [I] Save engine: ./engines/llamas_dla34.engine [03/29/2023-17:17:43] [I] Load engine: [03/29/2023-17:17:43] [I] Profiling verbosity: 0 [03/29/2023-17:17:43] [I] Tactic sources: Using default tactic sources [03/29/2023-17:17:43] [I] timingCacheMode: local [03/29/2023-17:17:43] [I] timingCacheFile: [03/29/2023-17:17:43] [I] Input(s)s format: fp32:CHW [03/29/2023-17:17:43] [I] Output(s)s format: fp32:CHW [03/29/2023-17:17:43] [I] Input build shapes: model [03/29/2023-17:17:43] [I] Input calibration shapes: model [03/29/2023-17:17:43] [I] === System Options === [03/29/2023-17:17:43] [I] Device: 0 [03/29/2023-17:17:43] [I] DLACore: [03/29/2023-17:17:43] [I] Plugins: [03/29/2023-17:17:43] [I] === Inference Options === [03/29/2023-17:17:43] [I] Batch: Explicit [03/29/2023-17:17:43] [I] Input inference shapes: model [03/29/2023-17:17:43] [I] Iterations: 10 [03/29/2023-17:17:43] [I] Duration: 3s (+ 200ms warm up) [03/29/2023-17:17:43] [I] Sleep time: 0ms [03/29/2023-17:17:43] [I] Idle time: 0ms [03/29/2023-17:17:43] [I] Streams: 1 [03/29/2023-17:17:43] [I] ExposeDMA: Disabled [03/29/2023-17:17:43] [I] Data transfers: Enabled [03/29/2023-17:17:43] [I] Spin-wait: Disabled [03/29/2023-17:17:43] [I] Multithreading: Disabled [03/29/2023-17:17:43] [I] CUDA Graph: Disabled [03/29/2023-17:17:43] [I] Separate profiling: Disabled [03/29/2023-17:17:43] [I] Time Deserialize: Disabled [03/29/2023-17:17:43] [I] Time Refit: Disabled [03/29/2023-17:17:43] [I] Inputs: [03/29/2023-17:17:43] [I] === Reporting Options === [03/29/2023-17:17:43] [I] Verbose: Disabled [03/29/2023-17:17:43] [I] Averages: 10 inferences [03/29/2023-17:17:43] [I] Percentile: 99 [03/29/2023-17:17:43] [I] Dump refittable layers:Disabled [03/29/2023-17:17:43] [I] Dump output: Disabled [03/29/2023-17:17:43] [I] Profile: Disabled [03/29/2023-17:17:43] [I] Export timing to JSON file: [03/29/2023-17:17:43] [I] Export output to JSON file: [03/29/2023-17:17:43] [I] Export profile to JSON file: [03/29/2023-17:17:43] [I] [03/29/2023-17:17:43] [I] === Device Information === [03/29/2023-17:17:43] [I] Selected Device: NVIDIA GeForce RTX 3090 [03/29/2023-17:17:43] [I] Compute Capability: 8.6 [03/29/2023-17:17:43] [I] SMs: 82 [03/29/2023-17:17:43] [I] Compute Clock Rate: 1.785 GHz [03/29/2023-17:17:43] [I] Device Global Memory: 24575 MiB [03/29/2023-17:17:43] [I] Shared Memory per SM: 100 KiB [03/29/2023-17:17:43] [I] Memory Bus Width: 384 bits (ECC disabled) [03/29/2023-17:17:43] [I] Memory Clock Rate: 9.751 GHz [03/29/2023-17:17:43] [I] [03/29/2023-17:17:43] [I] TensorRT version: 8.4.2 [03/29/2023-17:17:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +494, GPU +0, now: CPU 7658, GPU 1441 (MiB) [03/29/2023-17:17:44] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +364, GPU +104, now: CPU 8216, GPU 1545 (MiB) [03/29/2023-17:17:44] [I] Start parsing network model [03/29/2023-17:17:44] [I] [TRT] ---------------------------------------------------------------- [03/29/2023-17:17:44] [I] [TRT] Input filename: ./engines/llamas_dla34_tmp.onnx [03/29/2023-17:17:44] [I] [TRT] ONNX IR version: 0.0.6 [03/29/2023-17:17:44] [I] [TRT] Opset version: 11 [03/29/2023-17:17:44] [I] [TRT] Producer name: pytorch [03/29/2023-17:17:44] [I] [TRT] Producer version: 1.9 [03/29/2023-17:17:44] [I] [TRT] Domain: [03/29/2023-17:17:44] [I] [TRT] Model version: 0 [03/29/2023-17:17:44] [I] [TRT] Doc string: [03/29/2023-17:17:44] [I] [TRT] ---------------------------------------------------------------- [03/29/2023-17:17:44] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [03/29/2023-17:17:44] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped [03/29/2023-17:17:44] [W] [TRT] onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped [03/29/2023-17:17:44] [E] Error[4]: [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape ten sor must have 0 or 1 reshape dimensions: dimensions were [-1,2]) [03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:773: While parsing node number 237 [Pad -> "496"]: [03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:774: --- Begin node --- [03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:775: input: "313" input: "494" input: "495" output: "496" name: "Pad_237" op_type: "Pad" attribute { name: "mode" s: "constant" type: STRING }

[03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:776: --- End node --- [03/29/2023-17:17:44] [E] [TRT] ModelImporter.cpp:779: ERROR: ModelImporter.cpp:180 In function parseGraph: [6] Invalid Node - Pad_237 [shuffleNode.cpp::nvinfer1::builder::ShuffleNode::symbolicExecute::392] Error Code 4: Internal Error (Reshape_226: IShuffleLayer applied to shape tensor must have 0 or 1 reshape dimensi ons: dimensions were [-1,2]) [03/29/2023-17:17:44] [E] Failed to parse onnx file [03/29/2023-17:17:44] [I] Finish parsing network model [03/29/2023-17:17:44] [E] Parsing model failed [03/29/2023-17:17:44] [E] Failed to create engine from model or file. [03/29/2023-17:17:44] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=./engines/llamas_dla34_tmp.onnx --saveEngine=./engines/llamas_dla34.engine

Yutong-gannis commented 1 year ago

@CrazyMustard-404 先诊断一下onnx文件 polygraphy surgeon sanitize your_path/tusimple_r18.onnx --fold-constants --output your_path/tusimple_r18.onnx

CrazyMustard-404 commented 1 year ago

这个诊断后看起来是正常的。 [W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored [I] RUNNING | Command: D:\anaconda\envs\ADAS\Scripts\polygraphy surgeon sanitize engines/llamas_dla34.onnx --fold-constants --output output/34.onnx [I] Inferring shapes in the model with onnxruntime.tools.symbolic_shape_infer. Note: To force Polygraphy to use onnx.shape_inference instead, set allow_onnxruntime=False or use the --no-onnxruntime-shape-inference command-line option. [I] Loading model: D:\Project\Self-driving-Truck-in-Euro-Truck-Simulator2-main\engines\llamas_dla34.onnx [I] Original Model: Name: torch-jit-export | ONNX Opset: 11

---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 800)]}

---- 1 Graph Output(s) ----
{3076 [dtype=float32, shape=(1, 192, 78)]}

---- 222 Initializer(s) ----

---- 2603 Node(s) ----

[I] Folding Constants | Pass 1 [E] Module: 'onnx_graphsurgeon' version '0.3.12' is installed, but version '>=0.3.21' is required. Please install the required version or set POLYGRAPHY_AUTOINSTALL_DEPS=1 in your environment variables to allow Polygraphy to do so automatically. Attempting to continue with the currently installed version of this module, but note that this may cause errors! [W] Constant folding pass failed. Skipping subsequent passes. Note: Error was: fold_constants() got an unexpected keyword argument 'size_threshold' [I] Saving ONNX model to: output/34.onnx [I] New Model: Name: torch-jit-export | ONNX Opset: 11

---- 1 Graph Input(s) ----
{input [dtype=float32, shape=(1, 3, 320, 800)]}

---- 1 Graph Output(s) ----
{3076 [dtype=float32, shape=(1, 192, 78)]}

---- 222 Initializer(s) ----

---- 2603 Node(s) ----

[I] PASSED | Runtime: 1.856s | Command: D:\anaconda\envs\ADAS\Scripts\polygraphy surgeon sanitize engines/llamas_dla34.onnx --fold-constants --output output/34.onnx

Yutong-gannis commented 1 year ago

@CrazyMustard-404 用这个34.onnx转trt试试

CrazyMustard-404 commented 1 year ago

@CrazyMustard-404 用这个34.onnx转trt试试

试了，还是报相同错误。

Yutong-gannis commented 1 year ago

@CrazyMustard-404 可以先去掉车道线检测试试 https://github.com/Yutong-gannis/ETSAuto/blob/8f8e367b9949cbb9475063b1992a9ba7e401f0b7/script/main.py#L78-L80

CrazyMustard-404 commented 1 year ago

@CrazyMustard-404 可以先去掉车道线检测试试

https://github.com/Yutong-gannis/ETSAuto/blob/8f8e367b9949cbb9475063b1992a9ba7e401f0b7/script/main.py#L78-L80

谢谢UP，问题已解决，是CUDA及tensorrt版本的问题。最终解决版本：CUDA 11.8 、 CUDNN 8.8.0.121_cuda11、 TensorRT-8.5.3.1、 torch==1.13.1+cu117.

Yutong-gannis / ETSAuto

构建 CLRNet 的 TensorRT 文件时报错 #35