THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
8k stars 662 forks source link

TRT error when converting onnx modle #276

Open xingyewuyu opened 2 weeks ago

xingyewuyu commented 2 weeks ago

I encountered the following error while using trtexec to convert the exported dynamic ONNX model:  (yolov10) D:\AI\yolo\yolov10\yolov10_train\train_for_suliao>F:\soft\NVIDIA\tensorrt\TensorRT-8.6.1.6.Windows10.x86_64.cuda-11.8\TensorRT-8.6.1.6\bin\trtexec --onnx=runs/detect/train/weights/best.onnx --saveEngine=runs/detect/train/weights/best.engine --fp16 &&&& RUNNING TensorRT.trtexec [TensorRT v8601] # F:\soft\NVIDIA\tensorrt\TensorRT-8.6.1.6.Windows10.x86_64.cuda-11.8\TensorRT-8.6.1.6\bin\trtexec --onnx=runs/detect/train/weights/best.onnx --saveEngine=runs/detect/train/weights/best.engine --fp16 [06/18/2024-16:40:07] [I] === Model Options === [06/18/2024-16:40:07] [I] Format: ONNX [06/18/2024-16:40:07] [I] Model: runs/detect/train/weights/best.onnx [06/18/2024-16:40:07] [I] Output: [06/18/2024-16:40:07] [I] === Build Options === [06/18/2024-16:40:07] [I] Max batch: explicit batch [06/18/2024-16:40:07] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [06/18/2024-16:40:07] [I] minTiming: 1 [06/18/2024-16:40:07] [I] avgTiming: 8 [06/18/2024-16:40:07] [I] Precision: FP32+FP16 [06/18/2024-16:40:07] [I] LayerPrecisions: [06/18/2024-16:40:07] [I] Layer Device Types: [06/18/2024-16:40:07] [I] Calibration: [06/18/2024-16:40:07] [I] Refit: Disabled [06/18/2024-16:40:07] [I] Version Compatible: Disabled [06/18/2024-16:40:07] [I] TensorRT runtime: full [06/18/2024-16:40:07] [I] Lean DLL Path: [06/18/2024-16:40:07] [I] Tempfile Controls: { in_memory: allow, temporary: allow } [06/18/2024-16:40:07] [I] Exclude Lean Runtime: Disabled [06/18/2024-16:40:07] [I] Sparsity: Disabled [06/18/2024-16:40:07] [I] Safe mode: Disabled [06/18/2024-16:40:07] [I] Build DLA standalone loadable: Disabled [06/18/2024-16:40:07] [I] Allow GPU fallback for DLA: Disabled [06/18/2024-16:40:07] [I] DirectIO mode: Disabled [06/18/2024-16:40:07] [I] Restricted mode: Disabled [06/18/2024-16:40:07] [I] Skip inference: Disabled [06/18/2024-16:40:07] [I] Save engine: runs/detect/train/weights/best.engine [06/18/2024-16:40:07] [I] Load engine: [06/18/2024-16:40:07] [I] Profiling verbosity: 0 [06/18/2024-16:40:07] [I] Tactic sources: Using default tactic sources [06/18/2024-16:40:07] [I] timingCacheMode: local [06/18/2024-16:40:07] [I] timingCacheFile: [06/18/2024-16:40:07] [I] Heuristic: Disabled [06/18/2024-16:40:07] [I] Preview Features: Use default preview flags. [06/18/2024-16:40:07] [I] MaxAuxStreams: -1 [06/18/2024-16:40:07] [I] BuilderOptimizationLevel: -1 [06/18/2024-16:40:07] [I] Input(s)s format: fp32:CHW [06/18/2024-16:40:07] [I] Output(s)s format: fp32:CHW [06/18/2024-16:40:07] [I] Input build shapes: model [06/18/2024-16:40:07] [I] Input calibration shapes: model [06/18/2024-16:40:07] [I] === System Options === [06/18/2024-16:40:07] [I] Device: 0 [06/18/2024-16:40:07] [I] DLACore: [06/18/2024-16:40:07] [I] Plugins: [06/18/2024-16:40:07] [I] setPluginsToSerialize: [06/18/2024-16:40:07] [I] dynamicPlugins: [06/18/2024-16:40:07] [I] ignoreParsedPluginLibs: 0 [06/18/2024-16:40:07] [I] [06/18/2024-16:40:07] [I] === Inference Options === [06/18/2024-16:40:07] [I] Batch: Explicit [06/18/2024-16:40:07] [I] Input inference shapes: model [06/18/2024-16:40:07] [I] Iterations: 10 [06/18/2024-16:40:07] [I] Duration: 3s (+ 200ms warm up) [06/18/2024-16:40:07] [I] Sleep time: 0ms [06/18/2024-16:40:07] [I] Idle time: 0ms [06/18/2024-16:40:07] [I] Inference Streams: 1 [06/18/2024-16:40:07] [I] ExposeDMA: Disabled [06/18/2024-16:40:07] [I] Data transfers: Enabled [06/18/2024-16:40:07] [I] Spin-wait: Disabled [06/18/2024-16:40:07] [I] Multithreading: Disabled [06/18/2024-16:40:07] [I] CUDA Graph: Disabled [06/18/2024-16:40:07] [I] Separate profiling: Disabled [06/18/2024-16:40:07] [I] Time Deserialize: Disabled [06/18/2024-16:40:07] [I] Time Refit: Disabled [06/18/2024-16:40:07] [I] NVTX verbosity: 0 [06/18/2024-16:40:07] [I] Persistent Cache Ratio: 0 [06/18/2024-16:40:07] [I] Inputs: [06/18/2024-16:40:07] [I] === Reporting Options === [06/18/2024-16:40:07] [I] Verbose: Disabled [06/18/2024-16:40:07] [I] Averages: 10 inferences [06/18/2024-16:40:07] [I] Percentiles: 90,95,99 [06/18/2024-16:40:07] [I] Dump refittable layers:Disabled [06/18/2024-16:40:07] [I] Dump output: Disabled [06/18/2024-16:40:07] [I] Profile: Disabled [06/18/2024-16:40:07] [I] Export timing to JSON file: [06/18/2024-16:40:07] [I] Export output to JSON file: [06/18/2024-16:40:07] [I] Export profile to JSON file: [06/18/2024-16:40:07] [I] [06/18/2024-16:40:07] [I] === Device Information === [06/18/2024-16:40:07] [I] Selected Device: Quadro M2200 [06/18/2024-16:40:07] [I] Compute Capability: 5.2 [06/18/2024-16:40:07] [I] SMs: 8 [06/18/2024-16:40:07] [I] Device Global Memory: 4096 MiB [06/18/2024-16:40:07] [I] Shared Memory per SM: 96 KiB [06/18/2024-16:40:07] [I] Memory Bus Width: 128 bits (ECC disabled) [06/18/2024-16:40:07] [I] Application Compute Clock Rate: 1.036 GHz [06/18/2024-16:40:07] [I] Application Memory Clock Rate: 2.754 GHz [06/18/2024-16:40:07] [I] [06/18/2024-16:40:07] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at. [06/18/2024-16:40:07] [I] [06/18/2024-16:40:07] [I] TensorRT version: 8.6.1 [06/18/2024-16:40:07] [I] Loading standard plugins [06/18/2024-16:40:12] [I] [TRT] [MemUsageChange] Init CUDA: CPU +52, GPU +0, now: CPU 12239, GPU 686 (MiB) [06/18/2024-16:40:31] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +111, GPU +0, now: CPU 13568, GPU 686 (MiB) [06/18/2024-16:40:31] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [06/18/2024-16:40:31] [I] Start parsing network model. [06/18/2024-16:40:31] [I] [TRT] ---------------------------------------------------------------- [06/18/2024-16:40:31] [I] [TRT] Input filename: runs/detect/train/weights/best.onnx [06/18/2024-16:40:31] [I] [TRT] ONNX IR version: 0.0.7 [06/18/2024-16:40:31] [I] [TRT] Opset version: 13 [06/18/2024-16:40:31] [I] [TRT] Producer name: pytorch [06/18/2024-16:40:31] [I] [TRT] Producer version: 1.12.1 [06/18/2024-16:40:31] [I] [TRT] Domain: [06/18/2024-16:40:31] [I] [TRT] Model version: 0 [06/18/2024-16:40:31] [I] [TRT] Doc string: [06/18/2024-16:40:31] [I] [TRT] ---------------------------------------------------------------- [06/18/2024-16:40:31] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [06/18/2024-16:40:32] [I] Finished parsing network model. Parse time: 0.399209 [06/18/2024-16:40:32] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [06/18/2024-16:40:32] [I] [TRT] Graph optimization time: 0.154788 seconds. [06/18/2024-16:40:32] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [06/18/2024-16:40:32] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [06/18/2024-16:40:32] [E] Error[10]: Could not find any implementation for node Conv_19. [06/18/2024-16:40:32] [E] Error[10]: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node Conv_19.) [06/18/2024-16:40:32] [E] Engine could not be created from network [06/18/2024-16:40:32] [E] Building engine failed [06/18/2024-16:40:32] [E] Failed to create engine from model or file. [06/18/2024-16:40:32] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8601] # F:\soft\NVIDIA\tensorrt\TensorRT-8.6.1.6.Windows10.x86_64.cuda-11.8\TensorRT-8.6.1.6\bin\trtexec --onnx=runs/detect/train/weights/best.onnx --saveEngine=runs/detect/train/weights/best.engine --fp16

Any one can help? Thanks.

xingyewuyu commented 2 weeks ago

Firstly, i appear the same error as https://github.com/THU-MIG/yolov10/issues/261. Then i modify the environment variable to the 8.6.1 version of TensortRT. Then the above error (“Could not find any implementation for node Conv_19”)appear.

PrinceP commented 8 hours ago

https://github.com/PrinceP/tensorrt-cpp-for-onnx/tree/main?tab=readme-ov-file#yolov10 Can you try out this implementation?