FeiYull / TensorRT-Alpha

🔥🔥🔥TensorRT for YOLOv8、YOLOv8-Pose、YOLOv8-Seg、YOLOv8-Cls、YOLOv7、YOLOv6、YOLOv5、YOLONAS......🚀🚀🚀CUDA IS ALL YOU NEED.🍎🍎🍎
GNU General Public License v2.0
1.3k stars 201 forks source link

How do I compile yolov5? #49

Closed M15-3080 closed 1 year ago

M15-3080 commented 1 year ago

Hello, the first time I use your project, I use YOLOv5, but in the process of compiling with VS, there are a lot of errors, I wonder if some configuration files are missing? Below are the files included in the project and the error images

image image

FeiYull commented 1 year ago

@M15-3080 for windows10: http://t.csdn.cn/4O958 or https://www.bilibili.com/video/BV1BD4y1A7KD/?spm_id_from=333.999.0.0

for linux: http://t.csdn.cn/Lxn2M

M15-3080 commented 1 year ago

适用于 Windows10:http://t.csdn.cn/4O958https://www.bilibili.com/video/BV1BD4y1A7KD/?spm_id_from=333.999.0.0

对于 Linux:http://t.csdn.cn/Lxn2M

Thanks for your reply. I generated an onnx model with V5, which cannot be compiled to a trt model with trtexec. My orders are:trtexec --onnx=best.onnx --saveEngine=best.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

[09/06/2023-14:11:32] [I] === Model Options === [09/06/2023-14:11:32] [I] Format: ONNX [09/06/2023-14:11:32] [I] Model: best.onnx [09/06/2023-14:11:32] [I] Output: [09/06/2023-14:11:32] [I] === Build Options === [09/06/2023-14:11:32] [I] Max batch: explicit batch [09/06/2023-14:11:32] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/06/2023-14:11:32] [I] minTiming: 1 [09/06/2023-14:11:32] [I] avgTiming: 8 [09/06/2023-14:11:32] [I] Precision: FP32 [09/06/2023-14:11:32] [I] LayerPrecisions: [09/06/2023-14:11:32] [I] Calibration: [09/06/2023-14:11:32] [I] Refit: Disabled [09/06/2023-14:11:32] [I] Sparsity: Disabled [09/06/2023-14:11:32] [I] Safe mode: Disabled [09/06/2023-14:11:32] [I] DirectIO mode: Disabled [09/06/2023-14:11:32] [I] Restricted mode: Disabled [09/06/2023-14:11:32] [I] Build only: Enabled [09/06/2023-14:11:32] [I] Save engine: best.trt [09/06/2023-14:11:32] [I] Load engine: [09/06/2023-14:11:32] [I] Profiling verbosity: 0 [09/06/2023-14:11:32] [I] Tactic sources: Using default tactic sources [09/06/2023-14:11:32] [I] timingCacheMode: local [09/06/2023-14:11:32] [I] timingCacheFile: [09/06/2023-14:11:32] [I] Input(s)s format: fp32:CHW [09/06/2023-14:11:32] [I] Output(s)s format: fp32:CHW [09/06/2023-14:11:32] [I] Input build shape: images=1x3x640x640+4x3x640x640+8x3x640x640 [09/06/2023-14:11:32] [I] Input calibration shapes: model [09/06/2023-14:11:32] [I] === System Options === [09/06/2023-14:11:32] [I] Device: 0 [09/06/2023-14:11:32] [I] DLACore: [09/06/2023-14:11:32] [I] Plugins: [09/06/2023-14:11:32] [I] === Inference Options === [09/06/2023-14:11:32] [I] Batch: Explicit [09/06/2023-14:11:32] [I] Input inference shape: images=4x3x640x640 [09/06/2023-14:11:32] [I] Iterations: 10 [09/06/2023-14:11:32] [I] Duration: 3s (+ 200ms warm up) [09/06/2023-14:11:32] [I] Sleep time: 0ms [09/06/2023-14:11:32] [I] Idle time: 0ms [09/06/2023-14:11:32] [I] Streams: 1 [09/06/2023-14:11:32] [I] ExposeDMA: Disabled [09/06/2023-14:11:32] [I] Data transfers: Enabled [09/06/2023-14:11:32] [I] Spin-wait: Disabled [09/06/2023-14:11:32] [I] Multithreading: Disabled [09/06/2023-14:11:32] [I] CUDA Graph: Disabled [09/06/2023-14:11:32] [I] Separate profiling: Disabled [09/06/2023-14:11:32] [I] Time Deserialize: Disabled [09/06/2023-14:11:32] [I] Time Refit: Disabled [09/06/2023-14:11:32] [I] Inputs: [09/06/2023-14:11:32] [I] === Reporting Options === [09/06/2023-14:11:32] [I] Verbose: Disabled [09/06/2023-14:11:32] [I] Averages: 10 inferences [09/06/2023-14:11:32] [I] Percentile: 99 [09/06/2023-14:11:32] [I] Dump refittable layers:Disabled [09/06/2023-14:11:32] [I] Dump output: Disabled [09/06/2023-14:11:32] [I] Profile: Disabled [09/06/2023-14:11:32] [I] Export timing to JSON file: [09/06/2023-14:11:32] [I] Export output to JSON file: [09/06/2023-14:11:32] [I] Export profile to JSON file: [09/06/2023-14:11:32] [I] [09/06/2023-14:11:32] [I] === Device Information === [09/06/2023-14:11:32] [I] Selected Device: NVIDIA GeForce RTX 3080 Laptop GPU [09/06/2023-14:11:32] [I] Compute Capability: 8.6 [09/06/2023-14:11:32] [I] SMs: 48 [09/06/2023-14:11:32] [I] Compute Clock Rate: 1.605 GHz [09/06/2023-14:11:32] [I] Device Global Memory: 8192 MiB [09/06/2023-14:11:32] [I] Shared Memory per SM: 100 KiB [09/06/2023-14:11:32] [I] Memory Bus Width: 256 bits (ECC disabled) [09/06/2023-14:11:32] [I] Memory Clock Rate: 7.001 GHz [09/06/2023-14:11:32] [I] [09/06/2023-14:11:32] [I] TensorRT version: 8.4.2 [09/06/2023-14:11:32] [I] [TRT] [MemUsageChange] Init CUDA: CPU +590, GPU +0, now: CPU 14990, GPU 1287 (MiB) [09/06/2023-14:11:33] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 15058 MiB, GPU 1287 MiB [09/06/2023-14:11:33] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 15219 MiB, GPU 1331 MiB [09/06/2023-14:11:33] [I] Start parsing network model [09/06/2023-14:11:33] [I] [TRT] ---------------------------------------------------------------- [09/06/2023-14:11:33] [I] [TRT] Input filename: best.onnx [09/06/2023-14:11:33] [I] [TRT] ONNX IR version: 0.0.7 [09/06/2023-14:11:33] [I] [TRT] Opset version: 12 [09/06/2023-14:11:33] [I] [TRT] Producer name: pytorch [09/06/2023-14:11:33] [I] [TRT] Producer version: 1.10 [09/06/2023-14:11:33] [I] [TRT] Domain: [09/06/2023-14:11:33] [I] [TRT] Model version: 0 [09/06/2023-14:11:33] [I] [TRT] Doc string: [09/06/2023-14:11:33] [I] [TRT] ---------------------------------------------------------------- [09/06/2023-14:11:33] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/06/2023-14:11:33] [I] Finish parsing network model [09/06/2023-14:11:33] [E] Static model does not take explicit shapes since the shape of inference tensors will be determined by the model itself [09/06/2023-14:11:33] [E] Network And Config setup failed [09/06/2023-14:11:33] [E] Building engine failed [09/06/2023-14:11:33] [E] Failed to create engine from model or file. [09/06/2023-14:11:33] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=best.onnx --saveEngine=best.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

FeiYull commented 1 year ago

@M15-3080

u need export dynamic onnx model, reference: https://github.com/FeiYull/TensorRT-Alpha/tree/main/yolov5

M15-3080 commented 1 year ago

@M15-3080

您需要导出动态 ONNX 模型,参考:https://github.com/FeiYull/TensorRT-Alpha/tree/main/yolov5

I've regenerated the dynamic onnx, I converted int64 to int32 using converr.py, but it still doesn't compile

[09/06/2023-15:02:01] [I] === Model Options === [09/06/2023-15:02:01] [I] Format: ONNX [09/06/2023-15:02:01] [I] Model: best32.onnx [09/06/2023-15:02:01] [I] Output: [09/06/2023-15:02:01] [I] === Build Options === [09/06/2023-15:02:01] [I] Max batch: explicit batch [09/06/2023-15:02:01] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/06/2023-15:02:01] [I] minTiming: 1 [09/06/2023-15:02:01] [I] avgTiming: 8 [09/06/2023-15:02:01] [I] Precision: FP32 [09/06/2023-15:02:01] [I] LayerPrecisions: [09/06/2023-15:02:01] [I] Calibration: [09/06/2023-15:02:01] [I] Refit: Disabled [09/06/2023-15:02:01] [I] Sparsity: Disabled [09/06/2023-15:02:01] [I] Safe mode: Disabled [09/06/2023-15:02:01] [I] DirectIO mode: Disabled [09/06/2023-15:02:01] [I] Restricted mode: Disabled [09/06/2023-15:02:01] [I] Build only: Enabled [09/06/2023-15:02:01] [I] Save engine: best.trt [09/06/2023-15:02:01] [I] Load engine: [09/06/2023-15:02:01] [I] Profiling verbosity: 0 [09/06/2023-15:02:01] [I] Tactic sources: Using default tactic sources [09/06/2023-15:02:01] [I] timingCacheMode: local [09/06/2023-15:02:01] [I] timingCacheFile: [09/06/2023-15:02:01] [I] Input(s)s format: fp32:CHW [09/06/2023-15:02:01] [I] Output(s)s format: fp32:CHW [09/06/2023-15:02:01] [I] Input build shape: images=1x3x640x640+4x3x640x640+8x3x640x640 [09/06/2023-15:02:01] [I] Input calibration shapes: model [09/06/2023-15:02:01] [I] === System Options === [09/06/2023-15:02:01] [I] Device: 0 [09/06/2023-15:02:01] [I] DLACore: [09/06/2023-15:02:01] [I] Plugins: [09/06/2023-15:02:01] [I] === Inference Options === [09/06/2023-15:02:01] [I] Batch: Explicit [09/06/2023-15:02:01] [I] Input inference shape: images=4x3x640x640 [09/06/2023-15:02:01] [I] Iterations: 10 [09/06/2023-15:02:01] [I] Duration: 3s (+ 200ms warm up) [09/06/2023-15:02:01] [I] Sleep time: 0ms [09/06/2023-15:02:01] [I] Idle time: 0ms [09/06/2023-15:02:01] [I] Streams: 1 [09/06/2023-15:02:01] [I] ExposeDMA: Disabled [09/06/2023-15:02:01] [I] Data transfers: Enabled [09/06/2023-15:02:01] [I] Spin-wait: Disabled [09/06/2023-15:02:01] [I] Multithreading: Disabled [09/06/2023-15:02:01] [I] CUDA Graph: Disabled [09/06/2023-15:02:01] [I] Separate profiling: Disabled [09/06/2023-15:02:01] [I] Time Deserialize: Disabled [09/06/2023-15:02:01] [I] Time Refit: Disabled [09/06/2023-15:02:01] [I] Inputs: [09/06/2023-15:02:01] [I] === Reporting Options === [09/06/2023-15:02:01] [I] Verbose: Disabled [09/06/2023-15:02:01] [I] Averages: 10 inferences [09/06/2023-15:02:01] [I] Percentile: 99 [09/06/2023-15:02:01] [I] Dump refittable layers:Disabled [09/06/2023-15:02:01] [I] Dump output: Disabled [09/06/2023-15:02:01] [I] Profile: Disabled [09/06/2023-15:02:01] [I] Export timing to JSON file: [09/06/2023-15:02:01] [I] Export output to JSON file: [09/06/2023-15:02:01] [I] Export profile to JSON file: [09/06/2023-15:02:01] [I] [09/06/2023-15:02:01] [I] === Device Information === [09/06/2023-15:02:01] [I] Selected Device: NVIDIA GeForce RTX 3080 Laptop GPU [09/06/2023-15:02:01] [I] Compute Capability: 8.6 [09/06/2023-15:02:01] [I] SMs: 48 [09/06/2023-15:02:01] [I] Compute Clock Rate: 1.605 GHz [09/06/2023-15:02:01] [I] Device Global Memory: 8192 MiB [09/06/2023-15:02:01] [I] Shared Memory per SM: 100 KiB [09/06/2023-15:02:01] [I] Memory Bus Width: 256 bits (ECC disabled) [09/06/2023-15:02:01] [I] Memory Clock Rate: 7.001 GHz [09/06/2023-15:02:01] [I] [09/06/2023-15:02:01] [I] TensorRT version: 8.4.2 [09/06/2023-15:02:01] [I] [TRT] [MemUsageChange] Init CUDA: CPU +589, GPU +0, now: CPU 14793, GPU 1287 (MiB) [09/06/2023-15:02:01] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 14860 MiB, GPU 1287 MiB [09/06/2023-15:02:02] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 15026 MiB, GPU 1331 MiB [09/06/2023-15:02:02] [I] Start parsing network model [09/06/2023-15:02:02] [I] [TRT] ---------------------------------------------------------------- [09/06/2023-15:02:02] [I] [TRT] Input filename: best32.onnx [09/06/2023-15:02:02] [I] [TRT] ONNX IR version: 0.0.9 [09/06/2023-15:02:02] [I] [TRT] Opset version: 12 [09/06/2023-15:02:02] [I] [TRT] Producer name: onnx-typecast [09/06/2023-15:02:02] [I] [TRT] Producer version: [09/06/2023-15:02:02] [I] [TRT] Domain: [09/06/2023-15:02:02] [I] [TRT] Model version: 0 [09/06/2023-15:02:02] [I] [TRT] Doc string: [09/06/2023-15:02:02] [I] [TRT] ---------------------------------------------------------------- [09/06/2023-15:02:02] [E] [TRT] ModelImporter.cpp:773: While parsing node number 290 [Range -> "466"]: [09/06/2023-15:02:02] [E] [TRT] ModelImporter.cpp:774: --- Begin node --- [09/06/2023-15:02:02] [E] [TRT] ModelImporter.cpp:775: input: "464" input: "463" input: "465" output: "466" name: "Range_290" op_type: "Range"

[09/06/2023-15:02:02] [E] [TRT] ModelImporter.cpp:776: --- End node --- [09/06/2023-15:02:02] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3353 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!" [09/06/2023-15:02:02] [E] Failed to parse onnx file [09/06/2023-15:02:02] [I] Finish parsing network model [09/06/2023-15:02:02] [E] Parsing model failed [09/06/2023-15:02:02] [E] Failed to create engine from model or file. [09/06/2023-15:02:02] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # trtexec --onnx=best32.onnx --saveEngine=best.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

FeiYull commented 1 year ago

@M15-3080 try to by compile the onnx file by tensorrt8.4.2.4, but not 8.4.0.2

M15-3080 commented 1 year ago

尝试通过 tensorrt8.4.2.4 编译 onnx 文件,而不是 8.4.0.2

I did use tensorrt8.4.2.4, and 8.4.0.2 is not on my computer image

FeiYull commented 1 year ago

@M15-3080 Try to valid the onnx file( https://drive.google.com/drive/folders/1hYhHYIUK4josm9pnLHP4k5Gp8W-A9xhW ), and compile it by the command: ../../../../TensorRT-8.4.2.4/bin/trtexec --onnx=yolov5n.onnx --saveEngine=yolov5n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

M15-3080 commented 1 year ago

尝试验证 onnx 文件( https://drive.google.com/drive/folders/1hYhHYIUK4josm9pnLHP4k5Gp8W-A9xhW ),并通过以下命令编译它: ../../../../TensorRT-8.4.2.4/bin/trtexec --onnx=yolov5n.onnx --saveEngine=yolov5n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

I re-tested with V8 again, and the results still did not compile, this is my compile command:/TensorRT-8.4.2.4/bin/trtexec --onnx=/TensorRT-8.4.2.4/bin/yolov8n.onnx --saveEngine=/TensorRT-8.4.2.4/bin/yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

image

[09/07/2023-11:00:34] [I] === Model Options === [09/07/2023-11:00:34] [I] Format: ONNX [09/07/2023-11:00:34] [I] Model: /TensorRT-8.4.2.4/bin/yolov8n.onnx [09/07/2023-11:00:34] [I] Output: [09/07/2023-11:00:34] [I] === Build Options === [09/07/2023-11:00:34] [I] Max batch: explicit batch [09/07/2023-11:00:34] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/07/2023-11:00:34] [I] minTiming: 1 [09/07/2023-11:00:34] [I] avgTiming: 8 [09/07/2023-11:00:34] [I] Precision: FP32 [09/07/2023-11:00:34] [I] LayerPrecisions: [09/07/2023-11:00:34] [I] Calibration: [09/07/2023-11:00:34] [I] Refit: Disabled [09/07/2023-11:00:34] [I] Sparsity: Disabled [09/07/2023-11:00:34] [I] Safe mode: Disabled [09/07/2023-11:00:34] [I] DirectIO mode: Disabled [09/07/2023-11:00:34] [I] Restricted mode: Disabled [09/07/2023-11:00:34] [I] Build only: Enabled [09/07/2023-11:00:34] [I] Save engine: /TensorRT-8.4.2.4/bin/yolov8n.trt [09/07/2023-11:00:34] [I] Load engine: [09/07/2023-11:00:34] [I] Profiling verbosity: 0 [09/07/2023-11:00:34] [I] Tactic sources: Using default tactic sources [09/07/2023-11:00:34] [I] timingCacheMode: local [09/07/2023-11:00:34] [I] timingCacheFile: [09/07/2023-11:00:34] [I] Input(s)s format: fp32:CHW [09/07/2023-11:00:34] [I] Output(s)s format: fp32:CHW [09/07/2023-11:00:34] [I] Input build shape: images=1x3x640x640+4x3x640x640+8x3x640x640 [09/07/2023-11:00:34] [I] Input calibration shapes: model [09/07/2023-11:00:34] [I] === System Options === [09/07/2023-11:00:34] [I] Device: 0 [09/07/2023-11:00:34] [I] DLACore: [09/07/2023-11:00:34] [I] Plugins: [09/07/2023-11:00:34] [I] === Inference Options === [09/07/2023-11:00:34] [I] Batch: Explicit [09/07/2023-11:00:34] [I] Input inference shape: images=4x3x640x640 [09/07/2023-11:00:34] [I] Iterations: 10 [09/07/2023-11:00:34] [I] Duration: 3s (+ 200ms warm up) [09/07/2023-11:00:34] [I] Sleep time: 0ms [09/07/2023-11:00:34] [I] Idle time: 0ms [09/07/2023-11:00:34] [I] Streams: 1 [09/07/2023-11:00:34] [I] ExposeDMA: Disabled [09/07/2023-11:00:34] [I] Data transfers: Enabled [09/07/2023-11:00:34] [I] Spin-wait: Disabled [09/07/2023-11:00:34] [I] Multithreading: Disabled [09/07/2023-11:00:34] [I] CUDA Graph: Disabled [09/07/2023-11:00:34] [I] Separate profiling: Disabled [09/07/2023-11:00:34] [I] Time Deserialize: Disabled [09/07/2023-11:00:34] [I] Time Refit: Disabled [09/07/2023-11:00:34] [I] Inputs: [09/07/2023-11:00:34] [I] === Reporting Options === [09/07/2023-11:00:34] [I] Verbose: Disabled [09/07/2023-11:00:34] [I] Averages: 10 inferences [09/07/2023-11:00:34] [I] Percentile: 99 [09/07/2023-11:00:34] [I] Dump refittable layers:Disabled [09/07/2023-11:00:34] [I] Dump output: Disabled [09/07/2023-11:00:34] [I] Profile: Disabled [09/07/2023-11:00:34] [I] Export timing to JSON file: [09/07/2023-11:00:34] [I] Export output to JSON file: [09/07/2023-11:00:34] [I] Export profile to JSON file: [09/07/2023-11:00:34] [I] [09/07/2023-11:00:34] [I] === Device Information === [09/07/2023-11:00:34] [I] Selected Device: NVIDIA GeForce RTX 3080 Laptop GPU [09/07/2023-11:00:34] [I] Compute Capability: 8.6 [09/07/2023-11:00:34] [I] SMs: 48 [09/07/2023-11:00:34] [I] Compute Clock Rate: 1.605 GHz [09/07/2023-11:00:34] [I] Device Global Memory: 8192 MiB [09/07/2023-11:00:34] [I] Shared Memory per SM: 100 KiB [09/07/2023-11:00:34] [I] Memory Bus Width: 256 bits (ECC disabled) [09/07/2023-11:00:34] [I] Memory Clock Rate: 7.001 GHz [09/07/2023-11:00:34] [I] [09/07/2023-11:00:34] [I] TensorRT version: 8.4.2 [09/07/2023-11:00:35] [I] [TRT] [MemUsageChange] Init CUDA: CPU +585, GPU +0, now: CPU 16214, GPU 1287 (MiB) [09/07/2023-11:00:35] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 16281 MiB, GPU 1287 MiB [09/07/2023-11:00:36] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 16451 MiB, GPU 1331 MiB [09/07/2023-11:00:36] [I] Start parsing network model [09/07/2023-11:00:36] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:00:36] [I] [TRT] Input filename: /TensorRT-8.4.2.4/bin/yolov8n.onnx [09/07/2023-11:00:36] [I] [TRT] ONNX IR version: 0.0.7 [09/07/2023-11:00:36] [I] [TRT] Opset version: 13 [09/07/2023-11:00:36] [I] [TRT] Producer name: pytorch [09/07/2023-11:00:36] [I] [TRT] Producer version: 1.10 [09/07/2023-11:00:36] [I] [TRT] Domain: [09/07/2023-11:00:36] [I] [TRT] Model version: 0 [09/07/2023-11:00:36] [I] [TRT] Doc string: [09/07/2023-11:00:36] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:00:36] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/07/2023-11:00:36] [E] [TRT] ModelImporter.cpp:773: While parsing node number 232 [Range -> "376"]: [09/07/2023-11:00:36] [E] [TRT] ModelImporter.cpp:774: --- Begin node --- [09/07/2023-11:00:36] [E] [TRT] ModelImporter.cpp:775: input: "374" input: "373" input: "375" output: "376" name: "Range_232" op_type: "Range"

[09/07/2023-11:00:36] [E] [TRT] ModelImporter.cpp:776: --- End node --- [09/07/2023-11:00:36] [E] [TRT] ModelImporter.cpp:779: ERROR: builtin_op_importers.cpp:3353 In function importRange: [8] Assertion failed: inputs.at(0).isInt32() && "For range operator with dynamic inputs, this version of TensorRT only supports INT32!" [09/07/2023-11:00:36] [E] Failed to parse onnx file [09/07/2023-11:00:36] [I] Finish parsing network model [09/07/2023-11:00:36] [E] Parsing model failed [09/07/2023-11:00:36] [E] Failed to create engine from model or file. [09/07/2023-11:00:36] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8402] # /TensorRT-8.4.2.4/bin/trtexec --onnx=/TensorRT-8.4.2.4/bin/yolov8n.onnx --saveEngine=/TensorRT-8.4.2.4/bin/yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

FeiYull commented 1 year ago

@M15-3080 The following is my logs for yolov5 and yolov8.

FeiYull commented 1 year ago

PS D:\ThirdParty\TensorRT-8.4.2.4\bin> .\trtexec.exe --onnx=yolov5n.onnx --saveEngine=yolov5n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 &&&& RUNNING TensorRT.trtexec [TensorRT v8402] # D:\ThirdParty\TensorRT-8.4.2.4\bin\trtexec.exe --onnx=yolov5n.onnx --saveEngine=yolov5n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 [09/07/2023-11:06:13] [I] === Model Options === [09/07/2023-11:06:13] [I] Format: ONNX [09/07/2023-11:06:13] [I] Model: yolov5n.onnx [09/07/2023-11:06:13] [I] Output: [09/07/2023-11:06:13] [I] === Build Options === [09/07/2023-11:06:13] [I] Max batch: explicit batch [09/07/2023-11:06:13] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/07/2023-11:06:13] [I] minTiming: 1 [09/07/2023-11:06:13] [I] avgTiming: 8 [09/07/2023-11:06:13] [I] Precision: FP32 [09/07/2023-11:06:13] [I] LayerPrecisions: [09/07/2023-11:06:13] [I] Calibration: [09/07/2023-11:06:13] [I] Refit: Disabled [09/07/2023-11:06:13] [I] Sparsity: Disabled [09/07/2023-11:06:13] [I] Safe mode: Disabled [09/07/2023-11:06:13] [I] DirectIO mode: Disabled [09/07/2023-11:06:13] [I] Restricted mode: Disabled [09/07/2023-11:06:13] [I] Build only: Enabled [09/07/2023-11:06:13] [I] Save engine: yolov5n.trt [09/07/2023-11:06:13] [I] Load engine: [09/07/2023-11:06:13] [I] Profiling verbosity: 0 [09/07/2023-11:06:13] [I] Tactic sources: Using default tactic sources [09/07/2023-11:06:13] [I] timingCacheMode: local [09/07/2023-11:06:13] [I] timingCacheFile: [09/07/2023-11:06:13] [I] Input(s)s format: fp32:CHW [09/07/2023-11:06:13] [I] Output(s)s format: fp32:CHW [09/07/2023-11:06:13] [I] Input build shape: images=1x3x640x640+4x3x640x640+8x3x640x640 [09/07/2023-11:06:13] [I] Input calibration shapes: model [09/07/2023-11:06:13] [I] === System Options === [09/07/2023-11:06:13] [I] Device: 0 [09/07/2023-11:06:13] [I] DLACore: [09/07/2023-11:06:13] [I] Plugins: [09/07/2023-11:06:13] [I] === Inference Options === [09/07/2023-11:06:13] [I] Batch: Explicit [09/07/2023-11:06:13] [I] Input inference shape: images=4x3x640x640 [09/07/2023-11:06:13] [I] Iterations: 10 [09/07/2023-11:06:13] [I] Duration: 3s (+ 200ms warm up) [09/07/2023-11:06:13] [I] Sleep time: 0ms [09/07/2023-11:06:13] [I] Idle time: 0ms [09/07/2023-11:06:13] [I] Streams: 1 [09/07/2023-11:06:13] [I] ExposeDMA: Disabled [09/07/2023-11:06:13] [I] Data transfers: Enabled [09/07/2023-11:06:13] [I] Spin-wait: Disabled [09/07/2023-11:06:13] [I] Multithreading: Disabled [09/07/2023-11:06:13] [I] CUDA Graph: Disabled [09/07/2023-11:06:13] [I] Separate profiling: Disabled [09/07/2023-11:06:13] [I] Time Deserialize: Disabled [09/07/2023-11:06:13] [I] Time Refit: Disabled [09/07/2023-11:06:13] [I] Inputs: [09/07/2023-11:06:13] [I] === Reporting Options === [09/07/2023-11:06:13] [I] Verbose: Disabled [09/07/2023-11:06:13] [I] Averages: 10 inferences [09/07/2023-11:06:13] [I] Percentile: 99 [09/07/2023-11:06:13] [I] Dump refittable layers:Disabled [09/07/2023-11:06:13] [I] Dump output: Disabled [09/07/2023-11:06:13] [I] Profile: Disabled [09/07/2023-11:06:13] [I] Export timing to JSON file: [09/07/2023-11:06:13] [I] Export output to JSON file: [09/07/2023-11:06:13] [I] Export profile to JSON file: [09/07/2023-11:06:13] [I] [09/07/2023-11:06:13] [I] === Device Information === [09/07/2023-11:06:13] [I] Selected Device: NVIDIA GeForce RTX 2070 with Max-Q Design [09/07/2023-11:06:13] [I] Compute Capability: 7.5 [09/07/2023-11:06:13] [I] SMs: 36 [09/07/2023-11:06:13] [I] Compute Clock Rate: 1.125 GHz [09/07/2023-11:06:13] [I] Device Global Memory: 8192 MiB [09/07/2023-11:06:13] [I] Shared Memory per SM: 64 KiB [09/07/2023-11:06:13] [I] Memory Bus Width: 256 bits (ECC disabled) [09/07/2023-11:06:13] [I] Memory Clock Rate: 5.501 GHz [09/07/2023-11:06:13] [I] [09/07/2023-11:06:13] [I] TensorRT version: 8.4.2 [09/07/2023-11:06:14] [I] [TRT] [MemUsageChange] Init CUDA: CPU +403, GPU +0, now: CPU 13195, GPU 1149 (MiB) [09/07/2023-11:06:16] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +215, GPU +68, now: CPU 13603, GPU 1217 (MiB) [09/07/2023-11:06:16] [I] Start parsing network model [09/07/2023-11:06:16] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:06:16] [I] [TRT] Input filename: yolov5n.onnx [09/07/2023-11:06:16] [I] [TRT] ONNX IR version: 0.0.7 [09/07/2023-11:06:16] [I] [TRT] Opset version: 12 [09/07/2023-11:06:16] [I] [TRT] Producer name: pytorch [09/07/2023-11:06:16] [I] [TRT] Producer version: 1.13.1 [09/07/2023-11:06:16] [I] [TRT] Domain: [09/07/2023-11:06:16] [I] [TRT] Model version: 0 [09/07/2023-11:06:16] [I] [TRT] Doc string: [09/07/2023-11:06:16] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:06:16] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/07/2023-11:06:16] [I] Finish parsing network model [09/07/2023-11:06:17] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +436, GPU +166, now: CPU 13925, GPU 1383 (MiB) [09/07/2023-11:06:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +404, GPU +170, now: CPU 14329, GPU 1553 (MiB) [09/07/2023-11:06:18] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.1 [09/07/2023-11:06:18] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [09/07/2023-11:08:05] [I] [TRT] Detected 1 inputs and 4 output network tensors. [09/07/2023-11:08:06] [I] [TRT] Total Host Persistent Memory: 127424 [09/07/2023-11:08:06] [I] [TRT] Total Device Persistent Memory: 1902592 [09/07/2023-11:08:06] [I] [TRT] Total Scratch Memory: 3231232 [09/07/2023-11:08:06] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 2200 MiB [09/07/2023-11:08:06] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 61.7432ms to assign 14 blocks to 172 nodes requiring 138829320 bytes. [09/07/2023-11:08:06] [I] [TRT] Total Activation Memory: 138829320 [09/07/2023-11:08:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 15521, GPU 1733 (MiB) [09/07/2023-11:08:06] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.1 [09/07/2023-11:08:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +1, GPU +9, now: CPU 1, GPU 9 (MiB) [09/07/2023-11:08:06] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [09/07/2023-11:08:06] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [09/07/2023-11:08:06] [I] Engine built in 112.825 sec. [09/07/2023-11:08:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 15321, GPU 1643 (MiB) [09/07/2023-11:08:06] [I] [TRT] Loaded engine size: 9 MiB [09/07/2023-11:08:06] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 15321, GPU 1661 (MiB) [09/07/2023-11:08:06] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.1 [09/07/2023-11:08:06] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +9, now: CPU 0, GPU 9 (MiB) [09/07/2023-11:08:06] [I] Engine deserialized in 0.122635 sec. [09/07/2023-11:08:06] [I] Skipped inference phase since --buildOnly is added. &&&& PASSED TensorRT.trtexec [TensorRT v8402] # D:\ThirdParty\TensorRT-8.4.2.4\bin\trtexec.exe --onnx=yolov5n.onnx --saveEngine=yolov5n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 PS D:\ThirdParty\TensorRT-8.4.2.4\bin>

FeiYull commented 1 year ago

PS D:\ThirdParty\TensorRT-8.4.2.4\bin> .\trtexec.exe --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 &&&& RUNNING TensorRT.trtexec [TensorRT v8402] # D:\ThirdParty\TensorRT-8.4.2.4\bin\trtexec.exe --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 [09/07/2023-11:07:18] [I] === Model Options === [09/07/2023-11:07:18] [I] Format: ONNX [09/07/2023-11:07:18] [I] Model: yolov8n.onnx [09/07/2023-11:07:18] [I] Output: [09/07/2023-11:07:18] [I] === Build Options === [09/07/2023-11:07:18] [I] Max batch: explicit batch [09/07/2023-11:07:18] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/07/2023-11:07:18] [I] minTiming: 1 [09/07/2023-11:07:18] [I] avgTiming: 8 [09/07/2023-11:07:18] [I] Precision: FP32 [09/07/2023-11:07:18] [I] LayerPrecisions: [09/07/2023-11:07:18] [I] Calibration: [09/07/2023-11:07:18] [I] Refit: Disabled [09/07/2023-11:07:18] [I] Sparsity: Disabled [09/07/2023-11:07:18] [I] Safe mode: Disabled [09/07/2023-11:07:18] [I] DirectIO mode: Disabled [09/07/2023-11:07:18] [I] Restricted mode: Disabled [09/07/2023-11:07:18] [I] Build only: Enabled [09/07/2023-11:07:18] [I] Save engine: yolov8n.trt [09/07/2023-11:07:18] [I] Load engine: [09/07/2023-11:07:18] [I] Profiling verbosity: 0 [09/07/2023-11:07:18] [I] Tactic sources: Using default tactic sources [09/07/2023-11:07:18] [I] timingCacheMode: local [09/07/2023-11:07:18] [I] timingCacheFile: [09/07/2023-11:07:18] [I] Input(s)s format: fp32:CHW [09/07/2023-11:07:18] [I] Output(s)s format: fp32:CHW [09/07/2023-11:07:18] [I] Input build shape: images=1x3x640x640+4x3x640x640+8x3x640x640 [09/07/2023-11:07:18] [I] Input calibration shapes: model [09/07/2023-11:07:18] [I] === System Options === [09/07/2023-11:07:18] [I] Device: 0 [09/07/2023-11:07:18] [I] DLACore: [09/07/2023-11:07:18] [I] Plugins: [09/07/2023-11:07:18] [I] === Inference Options === [09/07/2023-11:07:18] [I] Batch: Explicit [09/07/2023-11:07:18] [I] Input inference shape: images=4x3x640x640 [09/07/2023-11:07:18] [I] Iterations: 10 [09/07/2023-11:07:18] [I] Duration: 3s (+ 200ms warm up) [09/07/2023-11:07:18] [I] Sleep time: 0ms [09/07/2023-11:07:18] [I] Idle time: 0ms [09/07/2023-11:07:18] [I] Streams: 1 [09/07/2023-11:07:18] [I] ExposeDMA: Disabled [09/07/2023-11:07:18] [I] Data transfers: Enabled [09/07/2023-11:07:18] [I] Spin-wait: Disabled [09/07/2023-11:07:18] [I] Multithreading: Disabled [09/07/2023-11:07:18] [I] CUDA Graph: Disabled [09/07/2023-11:07:18] [I] Separate profiling: Disabled [09/07/2023-11:07:18] [I] Time Deserialize: Disabled [09/07/2023-11:07:18] [I] Time Refit: Disabled [09/07/2023-11:07:18] [I] Inputs: [09/07/2023-11:07:18] [I] === Reporting Options === [09/07/2023-11:07:18] [I] Verbose: Disabled [09/07/2023-11:07:18] [I] Averages: 10 inferences [09/07/2023-11:07:18] [I] Percentile: 99 [09/07/2023-11:07:18] [I] Dump refittable layers:Disabled [09/07/2023-11:07:18] [I] Dump output: Disabled [09/07/2023-11:07:18] [I] Profile: Disabled [09/07/2023-11:07:18] [I] Export timing to JSON file: [09/07/2023-11:07:18] [I] Export output to JSON file: [09/07/2023-11:07:18] [I] Export profile to JSON file: [09/07/2023-11:07:18] [I] [09/07/2023-11:07:18] [I] === Device Information === [09/07/2023-11:07:18] [I] Selected Device: NVIDIA GeForce RTX 2070 with Max-Q Design [09/07/2023-11:07:18] [I] Compute Capability: 7.5 [09/07/2023-11:07:18] [I] SMs: 36 [09/07/2023-11:07:18] [I] Compute Clock Rate: 1.125 GHz [09/07/2023-11:07:18] [I] Device Global Memory: 8192 MiB [09/07/2023-11:07:18] [I] Shared Memory per SM: 64 KiB [09/07/2023-11:07:18] [I] Memory Bus Width: 256 bits (ECC disabled) [09/07/2023-11:07:18] [I] Memory Clock Rate: 5.501 GHz [09/07/2023-11:07:18] [I] [09/07/2023-11:07:18] [I] TensorRT version: 8.4.2 [09/07/2023-11:07:18] [I] [TRT] [MemUsageChange] Init CUDA: CPU +363, GPU +0, now: CPU 14342, GPU 1149 (MiB) [09/07/2023-11:07:20] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +212, GPU +68, now: CPU 14729, GPU 1217 (MiB) [09/07/2023-11:07:20] [I] Start parsing network model [09/07/2023-11:07:20] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:07:20] [I] [TRT] Input filename: yolov8n.onnx [09/07/2023-11:07:20] [I] [TRT] ONNX IR version: 0.0.8 [09/07/2023-11:07:20] [I] [TRT] Opset version: 17 [09/07/2023-11:07:20] [I] [TRT] Producer name: pytorch [09/07/2023-11:07:20] [I] [TRT] Producer version: 1.13.1 [09/07/2023-11:07:20] [I] [TRT] Domain: [09/07/2023-11:07:20] [I] [TRT] Model version: 0 [09/07/2023-11:07:20] [I] [TRT] Doc string: [09/07/2023-11:07:20] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:07:20] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/07/2023-11:07:20] [I] Finish parsing network model [09/07/2023-11:07:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +358, GPU +166, now: CPU 14982, GPU 1383 (MiB) [09/07/2023-11:07:22] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +351, GPU +170, now: CPU 15333, GPU 1553 (MiB) [09/07/2023-11:07:22] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.1 [09/07/2023-11:07:22] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [09/07/2023-11:08:30] [I] [TRT] Detected 1 inputs and 3 output network tensors. [09/07/2023-11:08:30] [I] [TRT] Total Host Persistent Memory: 126848 [09/07/2023-11:08:30] [I] [TRT] Total Device Persistent Memory: 1781248 [09/07/2023-11:08:30] [I] [TRT] Total Scratch Memory: 0 [09/07/2023-11:08:30] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 4 MiB, GPU 1807 MiB [09/07/2023-11:08:30] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 117.412ms to assign 16 blocks to 184 nodes requiring 154889744 bytes. [09/07/2023-11:08:30] [I] [TRT] Total Activation Memory: 154889744 [09/07/2023-11:08:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 13971, GPU 1741 (MiB) [09/07/2023-11:08:30] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.1 [09/07/2023-11:08:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +2, GPU +17, now: CPU 2, GPU 17 (MiB) [09/07/2023-11:08:30] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [09/07/2023-11:08:30] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [09/07/2023-11:08:30] [I] Engine built in 72.3869 sec. [09/07/2023-11:08:30] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 13788, GPU 1643 (MiB) [09/07/2023-11:08:30] [I] [TRT] Loaded engine size: 16 MiB [09/07/2023-11:08:30] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 13788, GPU 1669 (MiB) [09/07/2023-11:08:30] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.1 [09/07/2023-11:08:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +16, now: CPU 0, GPU 16 (MiB) [09/07/2023-11:08:30] [I] Engine deserialized in 0.0723956 sec. [09/07/2023-11:08:30] [I] Skipped inference phase since --buildOnly is added. &&&& PASSED TensorRT.trtexec [TensorRT v8402] # D:\ThirdParty\TensorRT-8.4.2.4\bin\trtexec.exe --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

M15-3080 commented 1 year ago

PS D:\第三方\TensorRT-8.4.2.4\bin> .\trtexec.exe --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 &&&& RUNNING TensorRT.trtexec [TensorRT v8402] # D:\ThirdParty\TensorRT-8.4.2.4\bin\trtexec.exe --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 [09/07/2023-11:07:18] [I] === 模型选项 === [09/07/2023-11:07:18] [I] 格式:ONNX [09/07/2023-11:07:18] [I] 型号:yolov8n.onnx [09/07/2023-11:07:18] [I] 输出: [09/07/2023-11:07:18] [I] === 构建选项 === [09/07/2023-11:07:18] [I] 最大批处理:显式批处理 [09/07/2023-11:07:18] [I] 内存池:工作区:默认,dlaSRAM:默认,dlaLocalDRAM:默认,dlaGlobalDRAM:默认[09/07/2023-11:07:18] [I] 最小定时: 1 [09/07/2023-11:07:18] [I] 平均定时: 8 [09/07/2023-11:07:18] [I] 精度: FP32 [09/07/2023-11:07:18] [I] 层精度: [09/07/2023-11:07:18] [I] 校准: [09/07/2023-11:07:18] [I] 改装:已禁用 [09/07/2023-11:07:18] [I] 稀疏性:已禁用 [09/07/2023-11:07:18] [I] 安全模式:已禁用 [09/07/2023-11:07:18] [I] 直接IO模式:已禁用 [09/07/2023-11:07:18] [I] 受限模式:已禁用 [09/07/2023-11:07:18] [I] 仅内部版本:已启用 [09/07/2023-11:07:18] [I] 保存引擎:yolov8n.trt [09/07/2023-11:07:18] [I] 负载引擎:[09/07/2023-11:07:18] [I] 分析详细程度:0 [09/07/2023-11:07:18] [I] 策略源:使用默认策略源 [09/07/2023-11:07:18] [I] 计时缓存模式:本地[09/07/2023-11:07:18] [I] 计时缓存文件: [09/07/2023-11:07:18] [I] 输入格式: fp32:CHW [09/07/2023-11:07:18] [I] 输出格式: fp32:CHW [09/07/2023-11:07:18] [I] 输入构建形状:图像=1x3x640x640+4x3x640x640+8x3x640x640 [09/07/2023-11:07:18] [I] 输入校准形状:型号 [09/07/2023-11:07:18] [I] === 系统选项 === [09/07/2023-11:07:18] [I] 设备:0[09/07/2023-11:07:18] [一]DLAC更多: [09/07/2023-11:07:18] [I] 插件: [09/07/2023-11:07:18] [I] === 推理选项 === [09/07/2023-11:07:18] [I] 批处理:显式 [09/07/2023-11:07:18] [I] 输入推理形状:图像=4x3x640x640 [09/07/2023-11:07:18] [I] 迭代次数:10 [09/07/2023-11:07:18] [I] 持续时间:3 秒(+ 200 毫秒预热) [09/07/2023-11:07:18] [I] 睡眠时间:0ms [09/07/2023-11:07:18] [I] 空闲时间:0ms [09/07/2023-11:07:18] [I] 流:1 [09/07/2023-11:07:18] [I] 公开DMA:已禁用 [09/07/2023-11:07:18] [I] 数据传输:已启用 [09/07/2023-11:07:18] [I] 旋转等待:已禁用 [09/07/2023-11:07:18] [I] 多线程:已禁用 [09/07/2023-11:07:18] [I] CUDA 图:已禁用 [09/07/2023-11:07:18] [I] 单独分析:已禁用 [09/07/2023-11:07:18] [I] 时间反序列化:已禁用 [09/07/2023-11:07:18] [I] 时间改装:禁用 [09/07/2023-11:07:18] [I] 输入: [09/07/2023-11:07:18] [I] === 报告选项 === [09/07/2023-11:07:18] [I] 详细:已禁用 [09/07/2023-11:07:18] [I] 平均值:10 个推论 [09/07/2023-11:07:18] [I] 百分位数:99 [09/07/2023-11:07:18] [I] 转储可改装层:已禁用 [09/07/2023-11:07:18] [I] 转储输出:已禁用 [09/07/2023-11:07:18] [I] 配置文件:已禁用 [09/07/2023-11:07:18] [I] 将计时导出到 JSON 文件: [09/07/2023-11:07:18] [I] 导出输出到 JSON 文件: [09/07/2023-11:07:18] [I] 将配置文件导出到 JSON 文件: [09/07/2023-11:07:18] [I] [09/07/2023-11:07:18] [I] === 设备信息 === [09/07/2023-11:07:18] [I] 所选设备:采用 Max-Q 设计的 NVIDIA GeForce RTX 2070 [09/07/2023-11:07:18] [I] 计算能力:7.5 [09/07/2023-11:07:18] [I] 短信:36[09/07/2023-11:07:18] [一]计算时钟速率:1.125 GHz [09/07/2023-11:07:18] [I] 设备全局内存:8192 MiB [09/07/2023-11:07:18] [I] 每 SM 共享内存:64 KiB [09/07/2023-11:07:18] [I] 内存总线宽度:256 位(禁用 ECC) [09/07/2023-11:07:18] [I] 内存时钟速率:5.501 GHz [09/07/2023-11:07:18] [I] [09/07/2023-11:07:18] [I] 张量RT版本:8.4.2[09/07/2023-11:07:18] [一][TRT][内存用法更改]初始化 CUDA: CPU +363, GPU +0, 现在: CPU 14342, GPU 1149 (MiB) [09/07/2023-11:07:20] [I] [TRT] [内存用法更改] 初始化生成器内核库: CPU +212, GPU +68, 现在: CPU 14729, GPU 1217 (MiB) [09/07/2023-11:07:20] [I] 开始解析网络模型 [09/07/2023-11:07:20] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-11:07:20] [I] [TRT] 输入文件名: yolov8n.onnx[09/07/2023-11:07:20] [I][TRT]ONNX IR 版本:0.0.8 [09/07/2023-11:07:20] [I] [TRT] 操作版本:17 [09/07/2023-11:07:20] [I] [TRT] 制作人名称:pytorch [09/07/2023-11:07:20] [I] [TRT] 制作者版本:1.13.1 [09/07/2023-11:07:20] [I] [TRT] 域: [09/07/2023-11:07:20] [I] [TRT] 模型版本: 0 [09/07/2023-11:07:20] [I] [TRT] 文档字符串:[09/07/2023-11:07:20] [I][TRT] ---------------------------------------------------------------- [09/07/2023-11:07:20] [W] [TRT] onnx2trt_utils.cpp:369:您的 ONNX 模型已使用 INT64 权重生成,而 TensorRT 本身不支持 INT64。尝试向下转换为 INT32。 [09/07/2023-11:07:20] [I] 完成网络模型解析 [09/07/2023-11:07:21] [I] [TRT] [内存用法更改] 初始化 cuBLAS/cuBLASLt:CPU +358,GPU +166,现在:CPU 14982,GPU 1383 (MiB) [09/07/2023-11:07:22] [I] [TRT] [内存用法变化] 初始化 cuDNN:CPU +351,GPU +170,现在:CPU 15333,GPU 1553 (MiB) [09/07/2023-11:07:22] [W] [TRT] TensorRT 与 cuDNN 8.4.1 链接,但加载了 cuDNN 8.2.1[09/07/2023-11:07:22] [I][TRT]正在使用的本地计时缓存。将不会存储此构建器通道中的性能分析结果。 [09/07/2023-11:08:30] [I] [TRT] 检测到 1 个输入和 3 个输出网络张量。 [09/07/2023-11:08:30] [I] [TRT] 主机持久内存总数:126848 [09/07/2023-11:08:30] [I] [TRT] 设备持久内存总数:1781248 [09/07/2023-11:08:30] [I] [TRT] 总暂存内存:0 [09/07/2023-11:08:30] [I] [TRT] [内存使用统计] TRT CPU/GPU 内存分配器的峰值内存使用量:CPU 4 MiB,GPU 1807 MiB [09/07/2023-11:08:30] [I] [TRT] [块分配] 算法 ShiftNTopDown 需要 117.412 毫秒将 16 个块分配给需要 154889744 字节的 184 个节点。 [09/07/2023-11:08:30] [I] [TRT] 总激活内存:154889744 [09/07/2023-11:08:30] [I] [TRT] [内存用法更改] 初始化 cuDNN:CPU +0,GPU +10,现在:CPU 13971,GPU 1741 (MiB) [09/07/2023-11:08:30] [W] [TRT] TensorRT 与 cuDNN 8.4.1 链接,但加载了 cuDNN 8.2.1 [09/07/2023-11:08:30] [I] [TRT] [内存用法更改] 构建引擎中的 TensorRT 管理分配:CPU +2、GPU +17,现在:CPU 2、GPU 17 (MiB)[09/07/2023-11:08:30] [W][TRT]getMaxBatchSize() 函数不应与从使用 NetworkDefinitionCreationFlag::kEXPLICIT_BATCH 标志创建的网络构建的引擎一起使用。此函数将始终返回 1。 [09/07/2023-11:08:30] [W] [TRT] getMaxBatchSize() 函数不应与从使用 NetworkDefinitionCreationFlag::kEXPLICIT_BATCH 标志创建的网络构建的引擎一起使用。此函数将始终返回 1。 [09/07/2023-11:08:30] [I] 引擎内置 72.3869 秒 [09/07/2023-11:08:30] [I] [TRT] [内存用法更改] 初始化 CUDA:CPU +0,GPU +0,现在:CPU 13788,GPU 1643 (MiB) [09/07/2023-11:08:30] [I] [TRT] 加载的引擎大小:16 MiB [09/07/2023-11:08:30] [I] [TRT] [内存用法更改] 初始化 cuDNN:CPU +0,GPU +8,现在:CPU 13788,GPU 1669 (MiB) [09/07/2023-11:08:30] [W] [TRT] TensorRT 与 cuDNN 8.4.1 相关联,但加载了 cuDNN 8.2.1 [09/07/2023-11:08:30] [I] [TRT] [内存用法更改] 引擎反序列化中的 TensorRT 管理分配:CPU +0、GPU +16,现在:CPU 0、GPU 16 (MiB) [09/07/2023-11:08:30] [I] 引擎在 0.0723956 秒内反序列化 [09/07/2023-11:08:30] [I] 跳过推理阶段,因为添加了 --buildOnly。 &&&& PASS TensorRT.trtexec [TensorRT v8402] # D:\ThirdParty\TensorRT-8.4.2.4\bin\trtexec.exe --onnx=yolov8n.onnx --saveEngine=yolov8n.trt --buildOnly --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640

Thank you for your reply, saying an incredible thing, when I changed onnx to static, I was able to compile successfully. My orders are: yolo mode=export model=yolov8n.pt format=onnx dynamic=False

trtexec --fp16 --workspace=2048 --onnx=yolov8n.onnx --saveEngine=yolov8n.trt

image [09/07/2023-13:35:06] [W] --workspace flag has been deprecated by --memPoolSize flag. [09/07/2023-13:35:06] [I] === Model Options === [09/07/2023-13:35:06] [I] Format: ONNX [09/07/2023-13:35:06] [I] Model: yolov8n.onnx [09/07/2023-13:35:06] [I] Output: [09/07/2023-13:35:06] [I] === Build Options === [09/07/2023-13:35:06] [I] Max batch: explicit batch [09/07/2023-13:35:06] [I] Memory Pools: workspace: 2048 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [09/07/2023-13:35:06] [I] minTiming: 1 [09/07/2023-13:35:06] [I] avgTiming: 8 [09/07/2023-13:35:06] [I] Precision: FP32+FP16 [09/07/2023-13:35:06] [I] LayerPrecisions: [09/07/2023-13:35:06] [I] Calibration: [09/07/2023-13:35:06] [I] Refit: Disabled [09/07/2023-13:35:06] [I] Sparsity: Disabled [09/07/2023-13:35:06] [I] Safe mode: Disabled [09/07/2023-13:35:06] [I] DirectIO mode: Disabled [09/07/2023-13:35:06] [I] Restricted mode: Disabled [09/07/2023-13:35:06] [I] Build only: Disabled [09/07/2023-13:35:06] [I] Save engine: yolov8n.trt [09/07/2023-13:35:06] [I] Load engine: [09/07/2023-13:35:06] [I] Profiling verbosity: 0 [09/07/2023-13:35:06] [I] Tactic sources: Using default tactic sources [09/07/2023-13:35:06] [I] timingCacheMode: local [09/07/2023-13:35:06] [I] timingCacheFile: [09/07/2023-13:35:06] [I] Input(s)s format: fp32:CHW [09/07/2023-13:35:06] [I] Output(s)s format: fp32:CHW [09/07/2023-13:35:06] [I] Input build shapes: model [09/07/2023-13:35:06] [I] Input calibration shapes: model [09/07/2023-13:35:06] [I] === System Options === [09/07/2023-13:35:06] [I] Device: 0 [09/07/2023-13:35:06] [I] DLACore: [09/07/2023-13:35:06] [I] Plugins: [09/07/2023-13:35:06] [I] === Inference Options === [09/07/2023-13:35:06] [I] Batch: Explicit [09/07/2023-13:35:06] [I] Input inference shapes: model [09/07/2023-13:35:06] [I] Iterations: 10 [09/07/2023-13:35:06] [I] Duration: 3s (+ 200ms warm up) [09/07/2023-13:35:06] [I] Sleep time: 0ms [09/07/2023-13:35:06] [I] Idle time: 0ms [09/07/2023-13:35:06] [I] Streams: 1 [09/07/2023-13:35:06] [I] ExposeDMA: Disabled [09/07/2023-13:35:06] [I] Data transfers: Enabled [09/07/2023-13:35:06] [I] Spin-wait: Disabled [09/07/2023-13:35:06] [I] Multithreading: Disabled [09/07/2023-13:35:06] [I] CUDA Graph: Disabled [09/07/2023-13:35:06] [I] Separate profiling: Disabled [09/07/2023-13:35:06] [I] Time Deserialize: Disabled [09/07/2023-13:35:06] [I] Time Refit: Disabled [09/07/2023-13:35:06] [I] Inputs: [09/07/2023-13:35:06] [I] === Reporting Options === [09/07/2023-13:35:06] [I] Verbose: Disabled [09/07/2023-13:35:06] [I] Averages: 10 inferences [09/07/2023-13:35:06] [I] Percentile: 99 [09/07/2023-13:35:06] [I] Dump refittable layers:Disabled [09/07/2023-13:35:06] [I] Dump output: Disabled [09/07/2023-13:35:06] [I] Profile: Disabled [09/07/2023-13:35:06] [I] Export timing to JSON file: [09/07/2023-13:35:06] [I] Export output to JSON file: [09/07/2023-13:35:06] [I] Export profile to JSON file: [09/07/2023-13:35:06] [I] [09/07/2023-13:35:06] [I] === Device Information === [09/07/2023-13:35:06] [I] Selected Device: NVIDIA GeForce RTX 3080 Laptop GPU [09/07/2023-13:35:06] [I] Compute Capability: 8.6 [09/07/2023-13:35:06] [I] SMs: 48 [09/07/2023-13:35:06] [I] Compute Clock Rate: 1.605 GHz [09/07/2023-13:35:06] [I] Device Global Memory: 8191 MiB [09/07/2023-13:35:06] [I] Shared Memory per SM: 100 KiB [09/07/2023-13:35:06] [I] Memory Bus Width: 256 bits (ECC disabled) [09/07/2023-13:35:06] [I] Memory Clock Rate: 7.001 GHz [09/07/2023-13:35:06] [I] [09/07/2023-13:35:06] [I] TensorRT version: 8.4.2 [09/07/2023-13:35:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +657, GPU +0, now: CPU 11865, GPU 1312 (MiB) [09/07/2023-13:35:07] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 11930 MiB, GPU 1312 MiB [09/07/2023-13:35:07] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 12118 MiB, GPU 1356 MiB [09/07/2023-13:35:07] [I] Start parsing network model [09/07/2023-13:35:07] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-13:35:07] [I] [TRT] Input filename: yolov8n.onnx [09/07/2023-13:35:07] [I] [TRT] ONNX IR version: 0.0.7 [09/07/2023-13:35:07] [I] [TRT] Opset version: 13 [09/07/2023-13:35:07] [I] [TRT] Producer name: pytorch [09/07/2023-13:35:07] [I] [TRT] Producer version: 1.10 [09/07/2023-13:35:07] [I] [TRT] Domain: [09/07/2023-13:35:07] [I] [TRT] Model version: 0 [09/07/2023-13:35:07] [I] [TRT] Doc string: [09/07/2023-13:35:07] [I] [TRT] ---------------------------------------------------------------- [09/07/2023-13:35:07] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/07/2023-13:35:07] [I] Finish parsing network model [09/07/2023-13:35:09] [W] [TRT] TensorRT was linked against cuBLAS/cuBLASLt 11.6.3 but loaded cuBLAS/cuBLASLt 11.3.0 [09/07/2023-13:35:09] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +789, GPU +268, now: CPU 12870, GPU 1624 (MiB) [09/07/2023-13:35:10] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +775, GPU +264, now: CPU 13645, GPU 1888 (MiB) [09/07/2023-13:35:10] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0 [09/07/2023-13:35:10] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [09/07/2023-13:38:08] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [09/07/2023-13:58:25] [I] [TRT] Detected 1 inputs and 3 output network tensors. [09/07/2023-13:58:25] [I] [TRT] Total Host Persistent Memory: 145712 [09/07/2023-13:58:25] [I] [TRT] Total Device Persistent Memory: 3300864 [09/07/2023-13:58:25] [I] [TRT] Total Scratch Memory: 0 [09/07/2023-13:58:25] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 268 MiB [09/07/2023-13:58:25] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 19.7609ms to assign 8 blocks to 114 nodes requiring 12748801 bytes. [09/07/2023-13:58:25] [I] [TRT] Total Activation Memory: 12748801 [09/07/2023-13:58:25] [W] [TRT] TensorRT was linked against cuBLAS/cuBLASLt 11.6.3 but loaded cuBLAS/cuBLASLt 11.3.0 [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 15502, GPU 2306 (MiB) [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 15502, GPU 2316 (MiB) [09/07/2023-13:58:26] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0 [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +6, GPU +7, now: CPU 6, GPU 7 (MiB) [09/07/2023-13:58:26] [I] Engine built in 1399.75 sec. [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 15290, GPU 2232 (MiB) [09/07/2023-13:58:26] [I] [TRT] Loaded engine size: 8 MiB [09/07/2023-13:58:26] [W] [TRT] TensorRT was linked against cuBLAS/cuBLASLt 11.6.3 but loaded cuBLAS/cuBLASLt 11.3.0 [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 15289, GPU 2252 (MiB) [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 15290, GPU 2260 (MiB) [09/07/2023-13:58:26] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0 [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +6, now: CPU 0, GPU 6 (MiB) [09/07/2023-13:58:26] [I] Engine deserialized in 0.0623982 sec. [09/07/2023-13:58:26] [W] [TRT] TensorRT was linked against cuBLAS/cuBLASLt 11.6.3 but loaded cuBLAS/cuBLASLt 11.3.0 [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 15288, GPU 2252 (MiB) [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +4, GPU +8, now: CPU 15292, GPU 2260 (MiB) [09/07/2023-13:58:26] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0 [09/07/2023-13:58:26] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +15, now: CPU 0, GPU 21 (MiB) [09/07/2023-13:58:26] [I] Using random values for input images [09/07/2023-13:58:26] [I] Created input binding for images with dimensions 1x3x640x640 [09/07/2023-13:58:26] [I] Using random values for output output0 [09/07/2023-13:58:26] [I] Created output binding for output0 with dimensions 1x84x8400 [09/07/2023-13:58:26] [I] Starting inference [09/07/2023-13:58:29] [I] Warmup completed 9 queries over 200 ms [09/07/2023-13:58:29] [I] Timing trace has 1066 queries over 3.00029 s [09/07/2023-13:58:29] [I] [09/07/2023-13:58:29] [I] === Trace details === [09/07/2023-13:58:29] [I] Trace averages of 10 runs: [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 8.22078 ms - Host latency: 10.3995 ms (enqueue 1.29095 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.47711 ms - Host latency: 2.75366 ms (enqueue 1.50258 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.47732 ms - Host latency: 2.71075 ms (enqueue 1.38157 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.54706 ms - Host latency: 2.78077 ms (enqueue 1.44044 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.52689 ms - Host latency: 2.78877 ms (enqueue 1.57856 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.48868 ms - Host latency: 2.72224 ms (enqueue 1.4899 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.51839 ms - Host latency: 2.80637 ms (enqueue 1.56302 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37582 ms - Host latency: 2.60355 ms (enqueue 1.54142 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.54193 ms - Host latency: 2.84217 ms (enqueue 1.87007 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.56896 ms - Host latency: 2.84466 ms (enqueue 2.17136 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.49258 ms - Host latency: 2.7822 ms (enqueue 1.76996 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.46055 ms - Host latency: 2.72289 ms (enqueue 1.80256 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.44842 ms - Host latency: 2.66216 ms (enqueue 1.62891 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.59747 ms - Host latency: 2.87508 ms (enqueue 2.16566 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3957 ms - Host latency: 2.64163 ms (enqueue 1.73252 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40939 ms - Host latency: 2.62823 ms (enqueue 1.64511 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36759 ms - Host latency: 2.60141 ms (enqueue 1.39649 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40483 ms - Host latency: 2.65399 ms (enqueue 1.31581 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.42285 ms - Host latency: 2.65114 ms (enqueue 1.6582 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3834 ms - Host latency: 2.64173 ms (enqueue 1.57159 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40515 ms - Host latency: 2.65579 ms (enqueue 1.48546 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.47672 ms - Host latency: 2.77246 ms (enqueue 1.78687 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.48738 ms - Host latency: 2.72432 ms (enqueue 2.3981 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.48376 ms - Host latency: 2.71785 ms (enqueue 2.10678 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36459 ms - Host latency: 2.58091 ms (enqueue 1.5019 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37089 ms - Host latency: 2.59257 ms (enqueue 2.13566 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36258 ms - Host latency: 2.57252 ms (enqueue 1.68998 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4283 ms - Host latency: 2.65691 ms (enqueue 1.53451 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.39006 ms - Host latency: 2.63468 ms (enqueue 1.37882 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.38752 ms - Host latency: 2.60781 ms (enqueue 1.45892 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36665 ms - Host latency: 2.62128 ms (enqueue 1.51432 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41217 ms - Host latency: 2.67775 ms (enqueue 1.51849 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.35577 ms - Host latency: 2.60313 ms (enqueue 1.40089 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36642 ms - Host latency: 2.59578 ms (enqueue 1.39349 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41644 ms - Host latency: 2.65525 ms (enqueue 1.69092 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.39735 ms - Host latency: 2.63389 ms (enqueue 1.55175 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37168 ms - Host latency: 2.61702 ms (enqueue 1.44327 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37952 ms - Host latency: 2.63807 ms (enqueue 1.49594 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37446 ms - Host latency: 2.6314 ms (enqueue 1.48047 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4028 ms - Host latency: 2.62645 ms (enqueue 1.48293 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.46223 ms - Host latency: 2.68894 ms (enqueue 2.40447 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.60012 ms - Host latency: 2.90316 ms (enqueue 2.48395 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.53791 ms - Host latency: 2.80779 ms (enqueue 2.31101 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3958 ms - Host latency: 2.64855 ms (enqueue 1.55289 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40032 ms - Host latency: 2.66886 ms (enqueue 1.81134 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.34661 ms - Host latency: 2.56169 ms (enqueue 1.51101 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.38516 ms - Host latency: 2.61027 ms (enqueue 1.50084 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3597 ms - Host latency: 2.5876 ms (enqueue 1.46913 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41598 ms - Host latency: 2.62931 ms (enqueue 1.52502 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.35638 ms - Host latency: 2.58486 ms (enqueue 1.51583 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.51996 ms - Host latency: 2.80276 ms (enqueue 1.83726 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.46665 ms - Host latency: 2.75853 ms (enqueue 2.03746 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.42338 ms - Host latency: 2.68132 ms (enqueue 1.84221 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.43368 ms - Host latency: 2.64564 ms (enqueue 1.79589 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.43062 ms - Host latency: 2.64086 ms (enqueue 1.76284 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.43738 ms - Host latency: 2.66877 ms (enqueue 1.60656 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4066 ms - Host latency: 2.70206 ms (enqueue 1.64324 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3754 ms - Host latency: 2.58008 ms (enqueue 1.42271 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.38114 ms - Host latency: 2.61343 ms (enqueue 1.59165 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40653 ms - Host latency: 2.61157 ms (enqueue 1.5155 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36689 ms - Host latency: 2.57098 ms (enqueue 1.43811 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37378 ms - Host latency: 2.58097 ms (enqueue 1.47463 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4016 ms - Host latency: 2.60645 ms (enqueue 1.50372 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3746 ms - Host latency: 2.57513 ms (enqueue 1.5531 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37592 ms - Host latency: 2.57673 ms (enqueue 1.52056 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41045 ms - Host latency: 2.61455 ms (enqueue 1.54456 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41147 ms - Host latency: 2.6177 ms (enqueue 1.53057 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.38127 ms - Host latency: 2.67952 ms (enqueue 1.59114 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.44939 ms - Host latency: 2.74797 ms (enqueue 1.51631 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.51338 ms - Host latency: 2.91243 ms (enqueue 1.73325 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.42061 ms - Host latency: 2.76973 ms (enqueue 1.57627 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.39241 ms - Host latency: 2.69824 ms (enqueue 1.45635 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41023 ms - Host latency: 2.67993 ms (enqueue 1.55552 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.51953 ms - Host latency: 2.80051 ms (enqueue 1.83401 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.52148 ms - Host latency: 2.72798 ms (enqueue 2.59165 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.473 ms - Host latency: 2.72002 ms (enqueue 2.7509 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.51721 ms - Host latency: 2.74443 ms (enqueue 2.74707 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.42483 ms - Host latency: 2.6572 ms (enqueue 1.5262 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41218 ms - Host latency: 2.654 ms (enqueue 1.63726 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40112 ms - Host latency: 2.61663 ms (enqueue 1.61704 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.47478 ms - Host latency: 2.70691 ms (enqueue 1.50957 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4397 ms - Host latency: 2.71165 ms (enqueue 1.65103 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.44133 ms - Host latency: 2.69834 ms (enqueue 1.5408 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4366 ms - Host latency: 2.74126 ms (enqueue 1.56531 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40383 ms - Host latency: 2.65852 ms (enqueue 1.4615 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.38496 ms - Host latency: 2.65869 ms (enqueue 1.68323 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.35566 ms - Host latency: 2.61277 ms (enqueue 1.51775 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37676 ms - Host latency: 2.62512 ms (enqueue 1.56213 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37173 ms - Host latency: 2.64587 ms (enqueue 1.49578 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.45376 ms - Host latency: 2.71152 ms (enqueue 1.49895 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.46328 ms - Host latency: 2.69585 ms (enqueue 1.72239 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.43936 ms - Host latency: 2.69224 ms (enqueue 1.39521 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.63333 ms - Host latency: 2.91592 ms (enqueue 2.49834 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.38079 ms - Host latency: 2.61545 ms (enqueue 1.5439 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.43979 ms - Host latency: 2.68711 ms (enqueue 1.61138 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.50625 ms - Host latency: 2.7772 ms (enqueue 1.63401 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.55281 ms - Host latency: 2.82339 ms (enqueue 2.10696 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.36831 ms - Host latency: 2.6168 ms (enqueue 1.76421 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41804 ms - Host latency: 2.62207 ms (enqueue 1.54058 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.43496 ms - Host latency: 2.6394 ms (enqueue 1.51309 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.3697 ms - Host latency: 2.56834 ms (enqueue 1.53801 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.4241 ms - Host latency: 2.62839 ms (enqueue 1.49941 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41401 ms - Host latency: 2.61248 ms (enqueue 1.48733 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.37207 ms - Host latency: 2.57385 ms (enqueue 1.53916 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.41443 ms - Host latency: 2.63232 ms (enqueue 1.53997 ms) [09/07/2023-13:58:29] [I] Average on 10 runs - GPU latency: 1.40698 ms - Host latency: 2.60872 ms (enqueue 1.44001 ms) [09/07/2023-13:58:29] [I] [09/07/2023-13:58:29] [I] === Performance summary === [09/07/2023-13:58:29] [I] Throughput: 355.299 qps [09/07/2023-13:58:29] [I] Latency: min = 2.48206 ms, max = 38.6206 ms, mean = 2.74773 ms, median = 2.60596 ms, percentile(99%) = 3.47552 ms [09/07/2023-13:58:29] [I] Enqueue Time: min = 1.08751 ms, max = 5.08392 ms, mean = 1.66565 ms, median = 1.53741 ms, percentile(99%) = 3.38721 ms [09/07/2023-13:58:29] [I] H2D Latency: min = 0.760498 ms, max = 3.09418 ms, mean = 0.800357 ms, median = 0.773529 ms, percentile(99%) = 1.05963 ms [09/07/2023-13:58:29] [I] GPU Compute Time: min = 1.28003 ms, max = 33.8074 ms, mean = 1.49366 ms, median = 1.38251 ms, percentile(99%) = 2.02368 ms [09/07/2023-13:58:29] [I] D2H Latency: min = 0.430664 ms, max = 1.71906 ms, mean = 0.453719 ms, median = 0.442017 ms, percentile(99%) = 0.61377 ms [09/07/2023-13:58:29] [I] Total Host Walltime: 3.00029 s [09/07/2023-13:58:29] [I] Total GPU Compute Time: 1.59224 s [09/07/2023-13:58:29] [W] Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized. [09/07/2023-13:58:29] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput. [09/07/2023-13:58:29] [W] GPU compute time is unstable, with coefficient of variance = 87.1629%. [09/07/2023-13:58:29] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability. [09/07/2023-13:58:29] [I] Explanations of the performance metrics are printed in the verbose logs. [09/07/2023-13:58:29] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8402] # trtexec --fp16 --workspace=2048 --onnx=yolov8n.onnx --saveEngine=yolov8n.trt

M15-3080 commented 1 year ago

Thank you very much for your patient response.I want to ask a stupid question, I have generated the trt model so far, but I imported the command parameters into VS2019 and the result shows: image

下面是工程文件 image image

FeiYull commented 1 year ago

@M15-3080 Try to set absolute path. TensorRT-Alpha supports dynamic batch-size by default, but your model is static.