NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.75k stars 2.13k forks source link

ONNX Runtime convert to TensorRT: fp32 & fp16 mode success but int8 mode fail #2311

Closed zml24 closed 2 years ago

zml24 commented 2 years ago

Description

For fp16 (trt_log_level: trt.Logger.INFO)

2022-09-08 17:36:02,762 - mmdeploy - INFO - Start pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt in subprocess 2022-09-08 17:36:03,426 - mmdeploy - INFO - Successfully loaded tensorrt plugins from xxx/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so [09/08/2022-17:36:08] [TRT] [I] [MemUsageChange] Init CUDA: CPU +319, GPU +0, now: CPU 400, GPU 1354 (MiB) [09/08/2022-17:36:09] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 399 MiB, GPU 1354 MiB [09/08/2022-17:36:09] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 534 MiB, GPU 1388 MiB [09/08/2022-17:36:10] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/08/2022-17:36:10] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped [09/08/2022-17:36:10] [TRT] [I] No importer registered for op: MMCVMultiLevelRoiAlign. Attempting to import as plugin. [09/08/2022-17:36:10] [TRT] [I] Searching for plugin: MMCVMultiLevelRoiAlign, plugin_version: 1, plugin_namespace: [09/08/2022-17:36:10] [TRT] [I] Successfully created plugin: MMCVMultiLevelRoiAlign [09/08/2022-17:36:10] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped [09/08/2022-17:36:10] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/08/2022-17:36:10] [TRT] [W] Output type must be INT32 for shape outputs [09/08/2022-17:36:10] [TRT] [W] Output type must be INT32 for shape outputs [09/08/2022-17:36:12] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1 [09/08/2022-17:36:12] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +489, GPU +206, now: CPU 1335, GPU 1594 (MiB) [09/08/2022-17:36:13] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +117, GPU +52, now: CPU 1452, GPU 1646 (MiB) [09/08/2022-17:36:13] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [09/08/2022-17:37:22] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [09/08/2022-17:38:22] [TRT] [I] Detected 3 inputs and 2 output network tensors. [09/08/2022-17:38:22] [TRT] [I] Total Host Persistent Memory: 148672 [09/08/2022-17:38:22] [TRT] [I] Total Device Persistent Memory: 61259264 [09/08/2022-17:38:22] [TRT] [I] Total Scratch Memory: 1024000 [09/08/2022-17:38:22] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 100 MiB, GPU 1237 MiB [09/08/2022-17:38:22] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 14.9649ms to assign 10 blocks to 97 nodes requiring 264632321 bytes. [09/08/2022-17:38:22] [TRT] [I] Total Activation Memory: 264632321 [09/08/2022-17:38:22] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1 [09/08/2022-17:38:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2342, GPU 2108 (MiB) [09/08/2022-17:38:22] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2342, GPU 2116 (MiB) [09/08/2022-17:38:23] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +78, GPU +128, now: CPU 78, GPU 128 (MiB) 2022-09-08 17:38:23,647 - mmdeploy - INFO - Finish pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt 2022-09-08 17:38:25,262 - mmdeploy - WARNING - "visualize_model" has been skipped may be because it's running on a headless device. 2022-09-08 17:38:25,262 - mmdeploy - INFO - All process success.

For int8 (trt_log_level: trt.Logger.INFO)

2022-09-08 17:41:42,477 - mmdeploy - INFO - Start pipeline mmdeploy.apis.calibration.create_calib_input_data in subprocess load checkpoint from local path: work_dirs/fast_rcnn_r50_fpn_fp16_1x_align/v1.3.pth loading annotations into memory... Done (t=0.12s) creating index... index created! 100%█████████████████████████████████████████ 248/248 [01:15<00:00, 3.27it/s] 2022-09-08 17:43:23,743 - mmdeploy - INFO - Finish pipeline mmdeploy.apis.calibration.create_calib_input_data 2022-09-08 17:43:44,359 - mmdeploy - INFO - Start pipeline mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt in subprocess 2022-09-08 17:43:45,070 - mmdeploy - INFO - Successfully loaded tensorrt plugins from xxx/mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so [09/08/2022-17:43:49] [TRT] [I] [MemUsageChange] Init CUDA: CPU +319, GPU +0, now: CPU 400, GPU 1354 (MiB) [09/08/2022-17:43:50] [TRT] [I] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 400 MiB, GPU 1354 MiB [09/08/2022-17:43:51] [TRT] [I] [MemUsageSnapshot] End constructing builder kernel library: CPU 534 MiB, GPU 1388 MiB [09/08/2022-17:43:52] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [09/08/2022-17:43:52] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped [09/08/2022-17:43:52] [TRT] [I] No importer registered for op: MMCVMultiLevelRoiAlign. Attempting to import as plugin. [09/08/2022-17:43:52] [TRT] [I] Searching for plugin: MMCVMultiLevelRoiAlign, plugin_version: 1, plugin_namespace: [09/08/2022-17:43:52] [TRT] [I] Successfully created plugin: MMCVMultiLevelRoiAlign [09/08/2022-17:43:52] [TRT] [W] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped [09/08/2022-17:43:52] [TRT] [W] Tensor DataType is determined at build time for tensors not marked as input or output. [09/08/2022-17:43:52] [TRT] [W] Output type must be INT32 for shape outputs [09/08/2022-17:43:52] [TRT] [W] Output type must be INT32 for shape outputs [09/08/2022-17:43:54] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1 [09/08/2022-17:43:54] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +486, GPU +206, now: CPU 1336, GPU 1594 (MiB) [09/08/2022-17:43:55] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +117, GPU +52, now: CPU 1453, GPU 1646 (MiB) [09/08/2022-17:43:55] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed. [09/08/2022-17:43:55] [TRT] [W] Calibration Profile is not defined. Running calibration with Profile 0 [09/08/2022-17:43:55] [TRT] [W] Calibration Profile is not defined. Running calibration with Profile 0 [09/08/2022-17:44:06] [TRT] [I] Detected 2 inputs and 2 output network tensors. [09/08/2022-17:44:06] [TRT] [I] Total Host Persistent Memory: 114848 [09/08/2022-17:44:06] [TRT] [I] Total Device Persistent Memory: 0 [09/08/2022-17:44:06] [TRT] [I] Total Scratch Memory: 1600 [09/08/2022-17:44:06] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 384 MiB [09/08/2022-17:44:06] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 38.0418ms to assign 8 blocks to 172 nodes requiring 263425024 bytes. [09/08/2022-17:44:06] [TRT] [I] Total Activation Memory: 263425024 [09/08/2022-17:44:06] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1 [09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2254, GPU 2234 (MiB) [09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2254, GPU 2242 (MiB) [09/08/2022-17:44:06] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.5 but loaded cuBLAS/cuBLAS LT 11.5.1 [09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2254, GPU 2218 (MiB) [09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2254, GPU 2226 (MiB) [09/08/2022-17:44:06] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +251, now: CPU 0, GPU 507 (MiB) [09/08/2022-17:44:06] [TRT] [I] Starting Calibration. 2022-09-08 17:44:07,139 - mmdeploy - ERROR - `mmdeploy.backend.tensorrt.onnx2tensorrt.onnx2tensorrt` with Call id: 2 failed. exit.

Environment

TensorRT Version: 8.2.3.0 NVIDIA GPU: T4 NVIDIA Driver Version: 450.102.04 CUDA Version: 11.0 CUDNN Version: 8302 (torch.backends.cudnn.version()) Operating System: CentOS Python Version (if applicable): 3.8.13 Tensorflow Version (if applicable): - PyTorch Version (if applicable): 1.12.1 Baremetal or Container (if so, version): -

zerollzeng commented 2 years ago

Looks like a problem of mmdeploy, I would suggest ask in mmdeploy repo.