NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.51k stars 2.1k forks source link

"Could not find any implementation" Error of TensorRT 8.5.2 when converting QDQ DwConv+BN+Swish model #3385

Open maminus opened 10 months ago

maminus commented 10 months ago

Description

I tried to convert QDQ DepthwizeConv > BatchNorm > Swish(Sigmoid, Mul) model by using trtexec, but it fails with the error below.

[E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul).)

The optimizer fuses above ops to one int8 operator, but it seems int8 depthwise conv with generic activation pattern is not implemented.

I tried some cases, int8 depthwise conv with generic activation case is only not implemented.

Model Result
DQ > Conv > BN > Swish > Q OK
DQ > DwConv > BN > Relu > Q OK
DQ > DwConv > BN > Swish > Q Error(Could not find any implementation)

Environment

TensorRT Version:8.5.2

NVIDIA GPU:Jetson AGX Orin

NVIDIA Driver Version:Unknown

CUDA Version:11.4

CUDNN Version:8.6.0

Operating System:Linux for tegra 34.1.4 (kernel 5.10, JetPack 5.1.2)

Python Version (if applicable):3.8.10

Tensorflow Version (if applicable):N/A

PyTorch Version (if applicable):2.0

Baremetal or Container (if so, version):Container dustynv/l4t-pytorch:r35.4.1

Relevant Files

trtexec full log ``` &&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx --minShapes=input:1x144x128x128 --optShapes=input:4x144x128x128 --maxShapes=input:32x144x128x128 --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json [10/14/2023-15:37:01] [I] === Model Options === [10/14/2023-15:37:01] [I] Format: ONNX [10/14/2023-15:37:01] [I] Model: qdq_dw_bn_swish.onnx [10/14/2023-15:37:01] [I] Output: [10/14/2023-15:37:01] [I] === Build Options === [10/14/2023-15:37:01] [I] Max batch: explicit batch [10/14/2023-15:37:01] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [10/14/2023-15:37:01] [I] minTiming: 1 [10/14/2023-15:37:01] [I] avgTiming: 8 [10/14/2023-15:37:01] [I] Precision: FP32+INT8 [10/14/2023-15:37:01] [I] LayerPrecisions: [10/14/2023-15:37:01] [I] Calibration: Dynamic [10/14/2023-15:37:01] [I] Refit: Disabled [10/14/2023-15:37:01] [I] Sparsity: Disabled [10/14/2023-15:37:01] [I] Safe mode: Disabled [10/14/2023-15:37:01] [I] DirectIO mode: Disabled [10/14/2023-15:37:01] [I] Restricted mode: Disabled [10/14/2023-15:37:01] [I] Build only: Enabled [10/14/2023-15:37:01] [I] Save engine: qdq_dw_bn_swish.plan [10/14/2023-15:37:01] [I] Load engine: [10/14/2023-15:37:01] [I] Profiling verbosity: 2 [10/14/2023-15:37:01] [I] Tactic sources: Using default tactic sources [10/14/2023-15:37:01] [I] timingCacheMode: local [10/14/2023-15:37:01] [I] timingCacheFile: [10/14/2023-15:37:01] [I] Heuristic: Disabled [10/14/2023-15:37:01] [I] Preview Features: Use default preview flags. [10/14/2023-15:37:01] [I] Input(s)s format: fp32:CHW [10/14/2023-15:37:01] [I] Output(s)s format: fp32:CHW [10/14/2023-15:37:01] [I] Input build shape: input=1x144x128x128+4x144x128x128+32x144x128x128 [10/14/2023-15:37:01] [I] Input calibration shapes: model [10/14/2023-15:37:01] [I] === System Options === [10/14/2023-15:37:01] [I] Device: 0 [10/14/2023-15:37:01] [I] DLACore: [10/14/2023-15:37:01] [I] Plugins: [10/14/2023-15:37:01] [I] === Inference Options === [10/14/2023-15:37:01] [I] Batch: Explicit [10/14/2023-15:37:01] [I] Input inference shape: input=4x144x128x128 [10/14/2023-15:37:01] [I] Iterations: 10 [10/14/2023-15:37:01] [I] Duration: 3s (+ 200ms warm up) [10/14/2023-15:37:01] [I] Sleep time: 0ms [10/14/2023-15:37:01] [I] Idle time: 0ms [10/14/2023-15:37:01] [I] Streams: 1 [10/14/2023-15:37:01] [I] ExposeDMA: Disabled [10/14/2023-15:37:01] [I] Data transfers: Enabled [10/14/2023-15:37:01] [I] Spin-wait: Disabled [10/14/2023-15:37:01] [I] Multithreading: Disabled [10/14/2023-15:37:01] [I] CUDA Graph: Disabled [10/14/2023-15:37:01] [I] Separate profiling: Disabled [10/14/2023-15:37:01] [I] Time Deserialize: Disabled [10/14/2023-15:37:01] [I] Time Refit: Disabled [10/14/2023-15:37:01] [I] NVTX verbosity: 2 [10/14/2023-15:37:01] [I] Persistent Cache Ratio: 0 [10/14/2023-15:37:01] [I] Inputs: [10/14/2023-15:37:01] [I] === Reporting Options === [10/14/2023-15:37:01] [I] Verbose: Enabled [10/14/2023-15:37:01] [I] Averages: 10 inferences [10/14/2023-15:37:01] [I] Percentiles: 90,95,99 [10/14/2023-15:37:01] [I] Dump refittable layers:Disabled [10/14/2023-15:37:01] [I] Dump output: Disabled [10/14/2023-15:37:01] [I] Profile: Disabled [10/14/2023-15:37:01] [I] Export timing to JSON file: [10/14/2023-15:37:01] [I] Export output to JSON file: [10/14/2023-15:37:01] [I] Export profile to JSON file: [10/14/2023-15:37:01] [I] [10/14/2023-15:37:01] [I] === Device Information === [10/14/2023-15:37:01] [I] Selected Device: Orin [10/14/2023-15:37:01] [I] Compute Capability: 8.7 [10/14/2023-15:37:01] [I] SMs: 16 [10/14/2023-15:37:01] [I] Compute Clock Rate: 1.3 GHz [10/14/2023-15:37:01] [I] Device Global Memory: 30592 MiB [10/14/2023-15:37:01] [I] Shared Memory per SM: 164 KiB [10/14/2023-15:37:01] [I] Memory Bus Width: 256 bits (ECC disabled) [10/14/2023-15:37:01] [I] Memory Clock Rate: 1.3 GHz [10/14/2023-15:37:01] [I] [10/14/2023-15:37:01] [I] TensorRT version: 8.5.2 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::CropAndResizeDynamic version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::CropAndResize version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::DecodeBbox3DPlugin version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GenerateDetection_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GroupNorm version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 2 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::LayerNorm version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::MultilevelProposeROI_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::NMSDynamic_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::PillarScatterPlugin version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ProposalDynamic version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Proposal version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Region_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ROIAlign_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ScatterND version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::SeqLen2Spatial version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::SplitGeLU version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Split version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::VoxelGeneratorPlugin version 1 [10/14/2023-15:37:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 19806 (MiB) [10/14/2023-15:37:03] [V] [TRT] Trying to load shared library libnvinfer_builder_resource.so.8.5.2 [10/14/2023-15:37:03] [V] [TRT] Loaded shared library libnvinfer_builder_resource.so.8.5.2 [10/14/2023-15:37:08] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +511, now: CPU 574, GPU 20281 (MiB) [10/14/2023-15:37:08] [I] Start parsing network model [10/14/2023-15:37:08] [I] [TRT] ---------------------------------------------------------------- [10/14/2023-15:37:08] [I] [TRT] Input filename: qdq_dw_bn_swish.onnx [10/14/2023-15:37:08] [I] [TRT] ONNX IR version: 0.0.7 [10/14/2023-15:37:08] [I] [TRT] Opset version: 13 [10/14/2023-15:37:08] [I] [TRT] Producer name: [10/14/2023-15:37:08] [I] [TRT] Producer version: [10/14/2023-15:37:08] [I] [TRT] Domain: [10/14/2023-15:37:08] [I] [TRT] Model version: 0 [10/14/2023-15:37:08] [I] [TRT] Doc string: [10/14/2023-15:37:08] [I] [TRT] ---------------------------------------------------------------- [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::BatchTilePlugin_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::CoordConvAC version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::CropAndResizeDynamic version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::DecodeBbox3DPlugin version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_Explicit_TF_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_Implicit_TF_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GenerateDetection_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GridAnchorRect_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GroupNorm version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 2 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::LayerNorm version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::MultilevelCropAndResize_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::MultilevelProposeROI_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::MultiscaleDeformableAttnPlugin_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::NMSDynamic_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::PillarScatterPlugin version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ProposalDynamic version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Proposal version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ROIAlign_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ScatterND version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::SeqLen2Spatial version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::SplitGeLU version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Split version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::VoxelGeneratorPlugin version 1 [10/14/2023-15:37:08] [V] [TRT] Adding network input: input with dtype: float32, dimensions: (-1, 144, 128, 128) [10/14/2023-15:37:08] [V] [TRT] Registering tensor: input for ONNX tensor: input [10/14/2023-15:37:08] [V] [TRT] Importing initializer: x_scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: x_zero_point [10/14/2023-15:37:08] [V] [TRT] Importing initializer: w [10/14/2023-15:37:08] [V] [TRT] Importing initializer: w_scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: w_zero_point [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.B [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.mean [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.var [10/14/2023-15:37:08] [V] [TRT] Importing initializer: y_scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: y_zero_point [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_input.q [QuantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: input [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_input.q [QuantizeLinear] inputs: [input -> (-1, 144, 128, 128)[FLOAT]], [x_scale -> ()[FLOAT]], [x_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: x_scale for ONNX node: x_scale [10/14/2023-15:37:08] [V] [TRT] Registering layer: x_zero_point for ONNX node: x_zero_point [10/14/2023-15:37:08] [V] [TRT] Registering tensor: input.q for ONNX tensor: input.q [10/14/2023-15:37:08] [V] [TRT] node_of_input.q [QuantizeLinear] outputs: [input.q -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_dqconv.dq_feature [DequantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: input.q [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_feature [DequantizeLinear] inputs: [input.q -> (-1, 144, 128, 128)[FLOAT]], [x_scale -> ()[FLOAT]], [x_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering tensor: dqconv.dq_feature for ONNX tensor: dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_feature [DequantizeLinear] outputs: [dqconv.dq_feature -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_dqconv.dq_weight [DequantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: w [10/14/2023-15:37:08] [V] [TRT] Searching for input: w_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: w_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_weight [DequantizeLinear] inputs: [w -> (144, 1, 3, 3)[INT8]], [w_scale -> (144)[FLOAT]], [w_zero_point -> (144)[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: w for ONNX node: w [10/14/2023-15:37:08] [V] [TRT] Registering layer: w_scale for ONNX node: w_scale [10/14/2023-15:37:08] [V] [TRT] Registering layer: w_zero_point for ONNX node: w_zero_point [10/14/2023-15:37:08] [V] [TRT] Registering tensor: dqconv.dq_weight for ONNX tensor: dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_weight [DequantizeLinear] outputs: [dqconv.dq_weight -> (144, 1, 3, 3)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_dqconv.dwconv [Conv] [10/14/2023-15:37:08] [V] [TRT] Searching for input: dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] Searching for input: dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dwconv [Conv] inputs: [dqconv.dq_feature -> (-1, 144, 128, 128)[FLOAT]], [dqconv.dq_weight -> (144, 1, 3, 3)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Kernel weights are not set yet. Kernel weights must be set using setInput(1, kernel_tensor) API call. [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_dqconv.dwconv for ONNX node: node_of_dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] Registering tensor: dqconv.dwconv for ONNX tensor: dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dwconv [Conv] outputs: [dqconv.dwconv -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_bn [BatchNormalization] [10/14/2023-15:37:08] [V] [TRT] Searching for input: dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.B [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.mean [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.var [10/14/2023-15:37:08] [V] [TRT] node_of_bn [BatchNormalization] inputs: [dqconv.dwconv -> (-1, 144, 128, 128)[FLOAT]], [bn.scale -> (144)[FLOAT]], [bn.B -> (144)[FLOAT]], [bn.mean -> (144)[FLOAT]], [bn.var -> (144)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_bn for ONNX node: node_of_bn [10/14/2023-15:37:08] [V] [TRT] Registering tensor: bn for ONNX tensor: bn [10/14/2023-15:37:08] [V] [TRT] node_of_bn [BatchNormalization] outputs: [bn -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_swish.sigmoid [Sigmoid] [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn [10/14/2023-15:37:08] [V] [TRT] node_of_swish.sigmoid [Sigmoid] inputs: [bn -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_swish.sigmoid for ONNX node: node_of_swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] Registering tensor: swish.sigmoid for ONNX tensor: swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] node_of_swish.sigmoid [Sigmoid] outputs: [swish.sigmoid -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_swish.mul [Mul] [10/14/2023-15:37:08] [V] [TRT] Searching for input: swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn [10/14/2023-15:37:08] [V] [TRT] node_of_swish.mul [Mul] inputs: [swish.sigmoid -> (-1, 144, 128, 128)[FLOAT]], [bn -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_swish.mul for ONNX node: node_of_swish.mul [10/14/2023-15:37:08] [V] [TRT] Registering tensor: swish.mul for ONNX tensor: swish.mul [10/14/2023-15:37:08] [V] [TRT] node_of_swish.mul [Mul] outputs: [swish.mul -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_qdqconv.q [QuantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: swish.mul [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_qdqconv.q [QuantizeLinear] inputs: [swish.mul -> (-1, 144, 128, 128)[FLOAT]], [y_scale -> ()[FLOAT]], [y_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: y_scale for ONNX node: y_scale [10/14/2023-15:37:08] [V] [TRT] Registering layer: y_zero_point for ONNX node: y_zero_point [10/14/2023-15:37:08] [V] [TRT] Registering tensor: qdqconv.q for ONNX tensor: qdqconv.q [10/14/2023-15:37:08] [V] [TRT] node_of_qdqconv.q [QuantizeLinear] outputs: [qdqconv.q -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_output [DequantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: qdqconv.q [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_output [DequantizeLinear] inputs: [qdqconv.q -> (-1, 144, 128, 128)[FLOAT]], [y_scale -> ()[FLOAT]], [y_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering tensor: output_4 for ONNX tensor: output [10/14/2023-15:37:08] [V] [TRT] node_of_output [DequantizeLinear] outputs: [output -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Marking output_4 as output: output [10/14/2023-15:37:08] [I] Finish parsing network model [10/14/2023-15:37:08] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best [10/14/2023-15:37:08] [W] [TRT] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU [10/14/2023-15:37:08] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes. [10/14/2023-15:37:08] [V] [TRT] Original: 16 layers [10/14/2023-15:37:08] [V] [TRT] After dead-layer removal: 16 layers [10/14/2023-15:37:08] [V] [TRT] Running: ConstantInt8Validator on w [10/14/2023-15:37:08] [V] [TRT] Applying generic optimizations to the graph for inference. [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer - constant folding of Q/DQ initializers [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_input.q [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] Removing w_zero_point [10/14/2023-15:37:08] [V] [TRT] Removing w_scale [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_qdqconv.q [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_output [10/14/2023-15:37:08] [V] [TRT] Removing y_zero_point [10/14/2023-15:37:08] [V] [TRT] Removing y_scale [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] Removing x_zero_point [10/14/2023-15:37:08] [V] [TRT] Removing x_scale [10/14/2023-15:37:08] [V] [TRT] After Myelin optimization: 10 layers [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer - constant folding of Q/DQ initializers [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer forward pass - DQ motions and fusions [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer backward pass [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer quantization pass - Generate quantized ops [10/14/2023-15:37:08] [V] [TRT] Running: PointWiseFusion on node_of_swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] PointWiseFusion: Fusing node_of_swish.sigmoid with node_of_swish.mul [10/14/2023-15:37:08] [V] [TRT] Running: QConvScaleFusion on node_of_dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] Removing node_of_bn [10/14/2023-15:37:08] [V] [TRT] Running: GenericConvActFusion on node_of_dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] GenericConvActFusion: Fusing node_of_dqconv.dwconv with PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] Running: QuantizeDoubleInputNodes on node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] QuantizeDoubleInputNodes: fusing node_of_qdqconv.q into node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] QuantizeDoubleInputNodes: fusing (node_of_dqconv.dq_feature and node_of_dqconv.dq_weight) into node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] Removing node_of_qdqconv.q [10/14/2023-15:37:08] [V] [TRT] Removing node_of_dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] Removing node_of_dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] Running: ConstWeightsFusion on w [10/14/2023-15:37:08] [V] [TRT] ConstWeightsFusion: Fusing w with node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] After dupe layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After final dead-layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After tensor merging: 3 layers [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer quantization epilogue pass [10/14/2023-15:37:08] [V] [TRT] QDQ optimization pass [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer constant fold dangling QDQ pass [10/14/2023-15:37:08] [V] [TRT] Running: QDQToCopy on node_of_input.q [10/14/2023-15:37:08] [V] [TRT] Swap the layer type of node_of_input.q from QUANTIZE to kQDQ [10/14/2023-15:37:08] [V] [TRT] Running: QDQToCopy on node_of_output [10/14/2023-15:37:08] [V] [TRT] Swap the layer type of node_of_output from DEQUANTIZE to kQDQ [10/14/2023-15:37:08] [V] [TRT] After dupe layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After final dead-layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After tensor merging: 3 layers [10/14/2023-15:37:08] [V] [TRT] After vertical fusions: 3 layers [10/14/2023-15:37:08] [V] [TRT] After dupe layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After final dead-layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After tensor merging: 3 layers [10/14/2023-15:37:08] [V] [TRT] After slice removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After concat removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] Trying to split Reshape and strided tensor [10/14/2023-15:37:08] [V] [TRT] Graph construction and optimization completed in 0.0159258 seconds. [10/14/2023-15:37:08] [I] [TRT] ---------- Layers Running on DLA ---------- [10/14/2023-15:37:08] [I] [TRT] ---------- Layers Running on GPU ---------- [10/14/2023-15:37:08] [I] [TRT] [GpuLayer] COPY: node_of_input.q [10/14/2023-15:37:08] [I] [TRT] [GpuLayer] CONVOLUTION: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [I] [TRT] [GpuLayer] COPY: node_of_output [10/14/2023-15:37:08] [V] [TRT] Trying to load shared library libcublas.so.11 [10/14/2023-15:37:08] [V] [TRT] Loaded shared library libcublas.so.11 [10/14/2023-15:37:12] [V] [TRT] Using cublas as plugin tactic source [10/14/2023-15:37:12] [V] [TRT] Trying to load shared library libcublasLt.so.11 [10/14/2023-15:37:12] [V] [TRT] Loaded shared library libcublasLt.so.11 [10/14/2023-15:37:12] [V] [TRT] Using cublasLt as core library tactic source [10/14/2023-15:37:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +1002, now: CPU 1108, GPU 21303 (MiB) [10/14/2023-15:37:12] [V] [TRT] Trying to load shared library libcudnn.so.8 [10/14/2023-15:37:12] [V] [TRT] Loaded shared library libcudnn.so.8 [10/14/2023-15:37:12] [V] [TRT] Using cuDNN as plugin tactic source [10/14/2023-15:37:12] [V] [TRT] Using cuDNN as core library tactic source [10/14/2023-15:37:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +66, now: CPU 1191, GPU 21369 (MiB) [10/14/2023-15:37:12] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [10/14/2023-15:37:12] [V] [TRT] Constructing optimization profile number 0 [1/1]. [10/14/2023-15:37:12] [V] [TRT] Reserving memory for host IO tensors. Host: 0 bytes [10/14/2023-15:37:12] [V] [TRT] =============== Computing reformatting costs: [10/14/2023-15:37:12] [V] [TRT] *************** Autotuning Reformat: Float(2359296,16384,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:12] [V] [TRT] --------------- Timing Runner: node_of_input.q (Reformat) [10/14/2023-15:37:12] [V] [TRT] Tactic: 0x00000000000003e8 Time: 0.65739 [10/14/2023-15:37:12] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.674213 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.663886 [10/14/2023-15:37:13] [V] [TRT] Fastest Tactic: 0x00000000000003e8 Time: 0.65739 [10/14/2023-15:37:13] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003e8 [10/14/2023-15:37:13] [V] [TRT] *************** Autotuning Reformat: Float(2359296,16384,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:13] [V] [TRT] --------------- Timing Runner: node_of_input.q (Reformat) [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.14129 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.685362 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.63509 [10/14/2023-15:37:13] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.63509 [10/14/2023-15:37:13] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x0000000000000000 [10/14/2023-15:37:13] [V] [TRT] *************** Autotuning Reformat: Float(2359296,16384,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:13] [V] [TRT] --------------- Timing Runner: node_of_input.q (Reformat) [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003e8 Time: 3.19855 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.685934 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 3.21021 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.685934 [10/14/2023-15:37:14] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003ea [10/14/2023-15:37:14] [V] [TRT] =============== Computing reformatting costs: [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(2359296,16384,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.84465 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.712402 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.83695 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.712402 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(2359296,16384,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 2.80111 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.680265 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 2.81543 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.680265 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(589824,16384:4,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.80152 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.78608 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.778322 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.778322 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(589824,16384:4,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 2.18206 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.787214 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.678245 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.678245 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(81920,16384:32,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.8455 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 2.55205 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.83573 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 1.83573 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(81920,16384:32,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 0.636507 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.771067 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.124219 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.124219 [10/14/2023-15:37:14] [V] [TRT] =============== Computing reformatting costs: [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(2359296,16384,128,1) -> Float(2359296,16384,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: node_of_output (Reformat) [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003e8 Time: 0.411502 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.690999 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.716251 [10/14/2023-15:37:15] [V] [TRT] Fastest Tactic: 0x00000000000003e8 Time: 0.411502 [10/14/2023-15:37:15] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003e8 [10/14/2023-15:37:15] [V] [TRT] *************** Autotuning Reformat: Int8(589824,16384:4,128,1) -> Float(2359296,16384,128,1) *************** [10/14/2023-15:37:15] [V] [TRT] --------------- Timing Runner: node_of_output (Reformat) [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.36625 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.813559 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.725362 [10/14/2023-15:37:15] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.725362 [10/14/2023-15:37:15] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x0000000000000000 [10/14/2023-15:37:15] [V] [TRT] *************** Autotuning Reformat: Int8(81920,16384:32,128,1) -> Float(2359296,16384,128,1) *************** [10/14/2023-15:37:15] [V] [TRT] --------------- Timing Runner: node_of_output (Reformat) [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.39012 [10/14/2023-15:37:16] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.784887 [10/14/2023-15:37:16] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.37678 [10/14/2023-15:37:16] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.784887 [10/14/2023-15:37:16] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003ea [10/14/2023-15:37:16] [V] [TRT] =============== Computing costs for [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(2359296,16384,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(589824,16384:4,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CudaDepthwiseConvolution) [10/14/2023-15:37:16] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(589824,16384:4,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(81920,16384:32,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CudaGroupConvolution) [10/14/2023-15:37:16] [V] [TRT] CudaGroupConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CudaDepthwiseConvolution) [10/14/2023-15:37:16] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] Deleting timing cache: 6 entries, served 0 hits since creation. [10/14/2023-15:37:16] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul).) [10/14/2023-15:37:16] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) [10/14/2023-15:37:16] [E] Engine could not be created from network [10/14/2023-15:37:16] [E] Building engine failed [10/14/2023-15:37:16] [E] Failed to create engine from model or file. [10/14/2023-15:37:16] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx --minShapes=input:1x144x128x128 --optShapes=input:4x144x128x128 --maxShapes=input:32x144x128x128 --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json ```

Steps To Reproduce

Commands or scripts:

I made a code which makes minimal reproducible model, so

  1. generate ONNX file
  2. try to convert (error occurred)
# generate ONNX file
$ python3 gen_qdq_dw_bn_swish.py

# try to convert
$ /usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx "--minShapes=input:1x144x128x128" "--optShapes=input:4x144x128x128" "--maxShapes=input:32x144x128x128" --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json
generation code (gen_qdq_dw_bn_swish.py) ```python # gen_qdq_dw_bn_swish.py import onnx import numpy as np import onnx.numpy_helper import onnxruntime as ort def get_qdq_dw_bn_swish_model(): inputs = [onnx.helper.make_tensor_value_info('input', onnx.TensorProto.FLOAT, ['batch_size', 144, 128, 128])] outputs = [onnx.helper.make_tensor_value_info('output', onnx.TensorProto.FLOAT, ['batch_size', 144, 128, 128])] nodes = [ # input Q onnx.helper.make_node('QuantizeLinear', ['input', 'x_scale', 'x_zero_point'], ['input.q']), # DQ DW-Conv onnx.helper.make_node('DequantizeLinear', ['input.q', 'x_scale', 'x_zero_point'], ['dqconv.dq_feature']), onnx.helper.make_node('DequantizeLinear', ['w', 'w_scale', 'w_zero_point'], ['dqconv.dq_weight'], axis=0), onnx.helper.make_node('Conv', ['dqconv.dq_feature', 'dqconv.dq_weight'], ['dqconv.dwconv'], pads=[1, 1, 1, 1], kernel_shape=[3, 3], group=144), # BN onnx.helper.make_node('BatchNormalization', ['dqconv.dwconv', 'bn.scale', 'bn.B', 'bn.mean', 'bn.var'], ['bn']), # Swish onnx.helper.make_node('Sigmoid', ['bn'], ['swish.sigmoid']), onnx.helper.make_node('Mul', ['swish.sigmoid', 'bn'], ['swish.mul']), # Q onnx.helper.make_node('QuantizeLinear', ['swish.mul', 'y_scale', 'y_zero_point'], ['qdqconv.q']), # output DQ onnx.helper.make_node('DequantizeLinear', ['qdqconv.q', 'y_scale', 'y_zero_point'], ['output']), ] inits = [ # DQ onnx.numpy_helper.from_array(np.array(0.47191643714904785, dtype=np.float32), 'x_scale'), onnx.numpy_helper.from_array(np.array(0, dtype=np.int8), 'x_zero_point'), # W onnx.numpy_helper.from_array(np.ones([144, 1, 3, 3], dtype=np.int8) * int(-0.14677785336971283 / 0.005895098205655813), 'w'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 0.005895098205655813, 'w_scale'), onnx.numpy_helper.from_array(np.zeros([144], dtype=np.int8), 'w_zero_point'), # BN onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 1.507541537284851, 'bn.scale'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 0.10940893739461899, 'bn.B'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * -0.5512617826461792, 'bn.mean'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 1.0475233793258667, 'bn.var'), # Q onnx.numpy_helper.from_array(np.array(0.032607611268758774, dtype=np.float32), 'y_scale'), onnx.numpy_helper.from_array(np.array(0, dtype=np.int8), 'y_zero_point'), ] model = onnx.helper.make_model(onnx.helper.make_graph(nodes, 'pw', inputs, outputs, inits), opset_imports=[onnx.helper.make_opsetid('', 13)], ir_version=7) return model if __name__ == '__main__': print('***** DW-Conv > BN > Swish *****') model = get_qdq_dw_bn_swish_model() opts = ort.SessionOptions() sess = ort.InferenceSession(model.SerializeToString(), opts, ['CPUExecutionProvider']) res = sess.run(None, {'input': np.ones([32, 144, 128, 128], dtype=np.float32)}) assert res[0].shape == (32, 144, 128, 128) onnx.save(model, 'qdq_dw_bn_swish.onnx') print('*** to reproduce error, run following command ***') print('/usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx "--minShapes=input:1x144x128x128" "--optShapes=input:4x144x128x128" "--maxShapes=input:32x144x128x128" --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json') ```

Have you tried the latest release?: I tried by the latest JetPack environment.

Can this model run on other frameworks? Yes. I can run this model on ONNXRuntime.

zerollzeng commented 10 months ago

@ttyio Do you know? thanks!

ttyio commented 10 months ago

we have support for depthwiseConv + swish fusion. @zerollzeng could you repro with latest TRT? thanks!

zerollzeng commented 10 months ago

Still fail, filed internal bug 4356383 for this.

ttyio commented 10 months ago

@maminus , checked and we have a bug here. We will fix it in next release.

It's only trigger when there is no successor nodes after the Q/DQ + conv + swish + Q/DQ. This toy network is extracted from your full network, right? Do you see this same issue with the full network?

Before we fix it, if you want to run this network slice, another WAR is to insert Idendity before the end of your network. Thanks!

maminus commented 10 months ago

@ttyio The toy network is a part of the EfficientDet model and is a part just before the SE block in the DepthwiseSeparableConv.

The original EfficientDet model can be converted without error, but the Conv and Swish were not fused and the inference speed was slow, so I was looking for a way to edit the model to fuse the Conv and Swish.

Therefore, workaround is possible, but I hope they will be fused in the future to improve the inference speed. Thanks!

maminus commented 10 months ago

Before we fix it, if you want to run this network slice, another WAR is to insert Idendity before the end of your network.

I misread this comment. So you mean if I insert Identity, there would be fused without error.

I will try it to my full model. Thanks!

alsozatch commented 7 months ago

I get a very similar error still on TensorRT 8.6.2, did you ever get it resolved with that fix? Also not sure what the proposed fix is, just insert a torch.nn.Identity layer at the end of the network? Thanks