"Could not find any implementation" Error of TensorRT 8.5.2 when converting QDQ DwConv+BN+Swish model

Description

I tried to convert QDQ DepthwizeConv > BatchNorm > Swish(Sigmoid, Mul) model by using trtexec, but it fails with the error below.

[E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul).)

The optimizer fuses above ops to one int8 operator, but it seems int8 depthwise conv with generic activation pattern is not implemented.

I tried some cases, int8 depthwise conv with generic activation case is only not implemented.

Model	Result
DQ > Conv > BN > Swish > Q	OK
DQ > DwConv > BN > Relu > Q	OK
DQ > DwConv > BN > Swish > Q	Error(Could not find any implementation)

Environment

TensorRT Version:8.5.2

NVIDIA GPU:Jetson AGX Orin

NVIDIA Driver Version:Unknown

CUDA Version:11.4

CUDNN Version:8.6.0

Operating System:Linux for tegra 34.1.4 (kernel 5.10, JetPack 5.1.2)

Python Version (if applicable):3.8.10

Tensorflow Version (if applicable):N/A

PyTorch Version (if applicable):2.0

Baremetal or Container (if so, version):Container dustynv/l4t-pytorch:r35.4.1

Relevant Files

trtexec full log

``` &&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx --minShapes=input:1x144x128x128 --optShapes=input:4x144x128x128 --maxShapes=input:32x144x128x128 --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json [10/14/2023-15:37:01] [I] === Model Options === [10/14/2023-15:37:01] [I] Format: ONNX [10/14/2023-15:37:01] [I] Model: qdq_dw_bn_swish.onnx [10/14/2023-15:37:01] [I] Output: [10/14/2023-15:37:01] [I] === Build Options === [10/14/2023-15:37:01] [I] Max batch: explicit batch [10/14/2023-15:37:01] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [10/14/2023-15:37:01] [I] minTiming: 1 [10/14/2023-15:37:01] [I] avgTiming: 8 [10/14/2023-15:37:01] [I] Precision: FP32+INT8 [10/14/2023-15:37:01] [I] LayerPrecisions: [10/14/2023-15:37:01] [I] Calibration: Dynamic [10/14/2023-15:37:01] [I] Refit: Disabled [10/14/2023-15:37:01] [I] Sparsity: Disabled [10/14/2023-15:37:01] [I] Safe mode: Disabled [10/14/2023-15:37:01] [I] DirectIO mode: Disabled [10/14/2023-15:37:01] [I] Restricted mode: Disabled [10/14/2023-15:37:01] [I] Build only: Enabled [10/14/2023-15:37:01] [I] Save engine: qdq_dw_bn_swish.plan [10/14/2023-15:37:01] [I] Load engine: [10/14/2023-15:37:01] [I] Profiling verbosity: 2 [10/14/2023-15:37:01] [I] Tactic sources: Using default tactic sources [10/14/2023-15:37:01] [I] timingCacheMode: local [10/14/2023-15:37:01] [I] timingCacheFile: [10/14/2023-15:37:01] [I] Heuristic: Disabled [10/14/2023-15:37:01] [I] Preview Features: Use default preview flags. [10/14/2023-15:37:01] [I] Input(s)s format: fp32:CHW [10/14/2023-15:37:01] [I] Output(s)s format: fp32:CHW [10/14/2023-15:37:01] [I] Input build shape: input=1x144x128x128+4x144x128x128+32x144x128x128 [10/14/2023-15:37:01] [I] Input calibration shapes: model [10/14/2023-15:37:01] [I] === System Options === [10/14/2023-15:37:01] [I] Device: 0 [10/14/2023-15:37:01] [I] DLACore: [10/14/2023-15:37:01] [I] Plugins: [10/14/2023-15:37:01] [I] === Inference Options === [10/14/2023-15:37:01] [I] Batch: Explicit [10/14/2023-15:37:01] [I] Input inference shape: input=4x144x128x128 [10/14/2023-15:37:01] [I] Iterations: 10 [10/14/2023-15:37:01] [I] Duration: 3s (+ 200ms warm up) [10/14/2023-15:37:01] [I] Sleep time: 0ms [10/14/2023-15:37:01] [I] Idle time: 0ms [10/14/2023-15:37:01] [I] Streams: 1 [10/14/2023-15:37:01] [I] ExposeDMA: Disabled [10/14/2023-15:37:01] [I] Data transfers: Enabled [10/14/2023-15:37:01] [I] Spin-wait: Disabled [10/14/2023-15:37:01] [I] Multithreading: Disabled [10/14/2023-15:37:01] [I] CUDA Graph: Disabled [10/14/2023-15:37:01] [I] Separate profiling: Disabled [10/14/2023-15:37:01] [I] Time Deserialize: Disabled [10/14/2023-15:37:01] [I] Time Refit: Disabled [10/14/2023-15:37:01] [I] NVTX verbosity: 2 [10/14/2023-15:37:01] [I] Persistent Cache Ratio: 0 [10/14/2023-15:37:01] [I] Inputs: [10/14/2023-15:37:01] [I] === Reporting Options === [10/14/2023-15:37:01] [I] Verbose: Enabled [10/14/2023-15:37:01] [I] Averages: 10 inferences [10/14/2023-15:37:01] [I] Percentiles: 90,95,99 [10/14/2023-15:37:01] [I] Dump refittable layers:Disabled [10/14/2023-15:37:01] [I] Dump output: Disabled [10/14/2023-15:37:01] [I] Profile: Disabled [10/14/2023-15:37:01] [I] Export timing to JSON file: [10/14/2023-15:37:01] [I] Export output to JSON file: [10/14/2023-15:37:01] [I] Export profile to JSON file: [10/14/2023-15:37:01] [I] [10/14/2023-15:37:01] [I] === Device Information === [10/14/2023-15:37:01] [I] Selected Device: Orin [10/14/2023-15:37:01] [I] Compute Capability: 8.7 [10/14/2023-15:37:01] [I] SMs: 16 [10/14/2023-15:37:01] [I] Compute Clock Rate: 1.3 GHz [10/14/2023-15:37:01] [I] Device Global Memory: 30592 MiB [10/14/2023-15:37:01] [I] Shared Memory per SM: 164 KiB [10/14/2023-15:37:01] [I] Memory Bus Width: 256 bits (ECC disabled) [10/14/2023-15:37:01] [I] Memory Clock Rate: 1.3 GHz [10/14/2023-15:37:01] [I] [10/14/2023-15:37:01] [I] TensorRT version: 8.5.2 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::BatchTilePlugin_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::CoordConvAC version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::CropAndResizeDynamic version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::CropAndResize version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::DecodeBbox3DPlugin version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GenerateDetection_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::GroupNorm version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 2 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::LayerNorm version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::MultilevelProposeROI_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::NMSDynamic_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::PillarScatterPlugin version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ProposalDynamic version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Proposal version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Region_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ROIAlign_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::ScatterND version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::SeqLen2Spatial version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::SplitGeLU version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::Split version 1 [10/14/2023-15:37:01] [V] [TRT] Registered plugin creator - ::VoxelGeneratorPlugin version 1 [10/14/2023-15:37:02] [I] [TRT] [MemUsageChange] Init CUDA: CPU +220, GPU +0, now: CPU 249, GPU 19806 (MiB) [10/14/2023-15:37:03] [V] [TRT] Trying to load shared library libnvinfer_builder_resource.so.8.5.2 [10/14/2023-15:37:03] [V] [TRT] Loaded shared library libnvinfer_builder_resource.so.8.5.2 [10/14/2023-15:37:08] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +302, GPU +511, now: CPU 574, GPU 20281 (MiB) [10/14/2023-15:37:08] [I] Start parsing network model [10/14/2023-15:37:08] [I] [TRT] ---------------------------------------------------------------- [10/14/2023-15:37:08] [I] [TRT] Input filename: qdq_dw_bn_swish.onnx [10/14/2023-15:37:08] [I] [TRT] ONNX IR version: 0.0.7 [10/14/2023-15:37:08] [I] [TRT] Opset version: 13 [10/14/2023-15:37:08] [I] [TRT] Producer name: [10/14/2023-15:37:08] [I] [TRT] Producer version: [10/14/2023-15:37:08] [I] [TRT] Domain: [10/14/2023-15:37:08] [I] [TRT] Model version: 0 [10/14/2023-15:37:08] [I] [TRT] Doc string: [10/14/2023-15:37:08] [I] [TRT] ---------------------------------------------------------------- [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::BatchTilePlugin_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::CoordConvAC version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::CropAndResizeDynamic version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::DecodeBbox3DPlugin version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_Explicit_TF_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_Implicit_TF_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::EfficientNMS_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GenerateDetection_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GridAnchorRect_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::GroupNorm version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 2 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::LayerNorm version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::MultilevelCropAndResize_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::MultilevelProposeROI_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::MultiscaleDeformableAttnPlugin_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::NMSDynamic_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::PillarScatterPlugin version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ProposalDynamic version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Proposal version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ROIAlign_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::ScatterND version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::SeqLen2Spatial version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::SplitGeLU version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::Split version 1 [10/14/2023-15:37:08] [V] [TRT] Plugin creator already registered - ::VoxelGeneratorPlugin version 1 [10/14/2023-15:37:08] [V] [TRT] Adding network input: input with dtype: float32, dimensions: (-1, 144, 128, 128) [10/14/2023-15:37:08] [V] [TRT] Registering tensor: input for ONNX tensor: input [10/14/2023-15:37:08] [V] [TRT] Importing initializer: x_scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: x_zero_point [10/14/2023-15:37:08] [V] [TRT] Importing initializer: w [10/14/2023-15:37:08] [V] [TRT] Importing initializer: w_scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: w_zero_point [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.B [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.mean [10/14/2023-15:37:08] [V] [TRT] Importing initializer: bn.var [10/14/2023-15:37:08] [V] [TRT] Importing initializer: y_scale [10/14/2023-15:37:08] [V] [TRT] Importing initializer: y_zero_point [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_input.q [QuantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: input [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_input.q [QuantizeLinear] inputs: [input -> (-1, 144, 128, 128)[FLOAT]], [x_scale -> ()[FLOAT]], [x_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: x_scale for ONNX node: x_scale [10/14/2023-15:37:08] [V] [TRT] Registering layer: x_zero_point for ONNX node: x_zero_point [10/14/2023-15:37:08] [V] [TRT] Registering tensor: input.q for ONNX tensor: input.q [10/14/2023-15:37:08] [V] [TRT] node_of_input.q [QuantizeLinear] outputs: [input.q -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_dqconv.dq_feature [DequantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: input.q [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: x_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_feature [DequantizeLinear] inputs: [input.q -> (-1, 144, 128, 128)[FLOAT]], [x_scale -> ()[FLOAT]], [x_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering tensor: dqconv.dq_feature for ONNX tensor: dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_feature [DequantizeLinear] outputs: [dqconv.dq_feature -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_dqconv.dq_weight [DequantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: w [10/14/2023-15:37:08] [V] [TRT] Searching for input: w_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: w_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_weight [DequantizeLinear] inputs: [w -> (144, 1, 3, 3)[INT8]], [w_scale -> (144)[FLOAT]], [w_zero_point -> (144)[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: w for ONNX node: w [10/14/2023-15:37:08] [V] [TRT] Registering layer: w_scale for ONNX node: w_scale [10/14/2023-15:37:08] [V] [TRT] Registering layer: w_zero_point for ONNX node: w_zero_point [10/14/2023-15:37:08] [V] [TRT] Registering tensor: dqconv.dq_weight for ONNX tensor: dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dq_weight [DequantizeLinear] outputs: [dqconv.dq_weight -> (144, 1, 3, 3)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_dqconv.dwconv [Conv] [10/14/2023-15:37:08] [V] [TRT] Searching for input: dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] Searching for input: dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dwconv [Conv] inputs: [dqconv.dq_feature -> (-1, 144, 128, 128)[FLOAT]], [dqconv.dq_weight -> (144, 1, 3, 3)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Kernel weights are not set yet. Kernel weights must be set using setInput(1, kernel_tensor) API call. [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_dqconv.dwconv for ONNX node: node_of_dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] Registering tensor: dqconv.dwconv for ONNX tensor: dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] node_of_dqconv.dwconv [Conv] outputs: [dqconv.dwconv -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_bn [BatchNormalization] [10/14/2023-15:37:08] [V] [TRT] Searching for input: dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.B [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.mean [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn.var [10/14/2023-15:37:08] [V] [TRT] node_of_bn [BatchNormalization] inputs: [dqconv.dwconv -> (-1, 144, 128, 128)[FLOAT]], [bn.scale -> (144)[FLOAT]], [bn.B -> (144)[FLOAT]], [bn.mean -> (144)[FLOAT]], [bn.var -> (144)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_bn for ONNX node: node_of_bn [10/14/2023-15:37:08] [V] [TRT] Registering tensor: bn for ONNX tensor: bn [10/14/2023-15:37:08] [V] [TRT] node_of_bn [BatchNormalization] outputs: [bn -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_swish.sigmoid [Sigmoid] [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn [10/14/2023-15:37:08] [V] [TRT] node_of_swish.sigmoid [Sigmoid] inputs: [bn -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_swish.sigmoid for ONNX node: node_of_swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] Registering tensor: swish.sigmoid for ONNX tensor: swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] node_of_swish.sigmoid [Sigmoid] outputs: [swish.sigmoid -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_swish.mul [Mul] [10/14/2023-15:37:08] [V] [TRT] Searching for input: swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] Searching for input: bn [10/14/2023-15:37:08] [V] [TRT] node_of_swish.mul [Mul] inputs: [swish.sigmoid -> (-1, 144, 128, 128)[FLOAT]], [bn -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: node_of_swish.mul for ONNX node: node_of_swish.mul [10/14/2023-15:37:08] [V] [TRT] Registering tensor: swish.mul for ONNX tensor: swish.mul [10/14/2023-15:37:08] [V] [TRT] node_of_swish.mul [Mul] outputs: [swish.mul -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_qdqconv.q [QuantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: swish.mul [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_qdqconv.q [QuantizeLinear] inputs: [swish.mul -> (-1, 144, 128, 128)[FLOAT]], [y_scale -> ()[FLOAT]], [y_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering layer: y_scale for ONNX node: y_scale [10/14/2023-15:37:08] [V] [TRT] Registering layer: y_zero_point for ONNX node: y_zero_point [10/14/2023-15:37:08] [V] [TRT] Registering tensor: qdqconv.q for ONNX tensor: qdqconv.q [10/14/2023-15:37:08] [V] [TRT] node_of_qdqconv.q [QuantizeLinear] outputs: [qdqconv.q -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Parsing node: node_of_output [DequantizeLinear] [10/14/2023-15:37:08] [V] [TRT] Searching for input: qdqconv.q [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_scale [10/14/2023-15:37:08] [V] [TRT] Searching for input: y_zero_point [10/14/2023-15:37:08] [V] [TRT] node_of_output [DequantizeLinear] inputs: [qdqconv.q -> (-1, 144, 128, 128)[FLOAT]], [y_scale -> ()[FLOAT]], [y_zero_point -> ()[INT8]], [10/14/2023-15:37:08] [V] [TRT] Registering tensor: output_4 for ONNX tensor: output [10/14/2023-15:37:08] [V] [TRT] node_of_output [DequantizeLinear] outputs: [output -> (-1, 144, 128, 128)[FLOAT]], [10/14/2023-15:37:08] [V] [TRT] Marking output_4 as output: output [10/14/2023-15:37:08] [I] Finish parsing network model [10/14/2023-15:37:08] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best [10/14/2023-15:37:08] [W] [TRT] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU [10/14/2023-15:37:08] [W] [TRT] Calibrator won't be used in explicit precision mode. Use quantization aware training to generate network with Quantize/Dequantize nodes. [10/14/2023-15:37:08] [V] [TRT] Original: 16 layers [10/14/2023-15:37:08] [V] [TRT] After dead-layer removal: 16 layers [10/14/2023-15:37:08] [V] [TRT] Running: ConstantInt8Validator on w [10/14/2023-15:37:08] [V] [TRT] Applying generic optimizations to the graph for inference. [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer - constant folding of Q/DQ initializers [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_input.q [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] Removing w_zero_point [10/14/2023-15:37:08] [V] [TRT] Removing w_scale [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_qdqconv.q [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_output [10/14/2023-15:37:08] [V] [TRT] Removing y_zero_point [10/14/2023-15:37:08] [V] [TRT] Removing y_scale [10/14/2023-15:37:08] [V] [TRT] Running: ConstQDQInitializersFusion on node_of_dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] Removing x_zero_point [10/14/2023-15:37:08] [V] [TRT] Removing x_scale [10/14/2023-15:37:08] [V] [TRT] After Myelin optimization: 10 layers [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer - constant folding of Q/DQ initializers [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer forward pass - DQ motions and fusions [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer backward pass [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer quantization pass - Generate quantized ops [10/14/2023-15:37:08] [V] [TRT] Running: PointWiseFusion on node_of_swish.sigmoid [10/14/2023-15:37:08] [V] [TRT] PointWiseFusion: Fusing node_of_swish.sigmoid with node_of_swish.mul [10/14/2023-15:37:08] [V] [TRT] Running: QConvScaleFusion on node_of_dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] Removing node_of_bn [10/14/2023-15:37:08] [V] [TRT] Running: GenericConvActFusion on node_of_dqconv.dwconv [10/14/2023-15:37:08] [V] [TRT] GenericConvActFusion: Fusing node_of_dqconv.dwconv with PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] Running: QuantizeDoubleInputNodes on node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] QuantizeDoubleInputNodes: fusing node_of_qdqconv.q into node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] QuantizeDoubleInputNodes: fusing (node_of_dqconv.dq_feature and node_of_dqconv.dq_weight) into node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] Removing node_of_qdqconv.q [10/14/2023-15:37:08] [V] [TRT] Removing node_of_dqconv.dq_feature [10/14/2023-15:37:08] [V] [TRT] Removing node_of_dqconv.dq_weight [10/14/2023-15:37:08] [V] [TRT] Running: ConstWeightsFusion on w [10/14/2023-15:37:08] [V] [TRT] ConstWeightsFusion: Fusing w with node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [V] [TRT] After dupe layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After final dead-layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After tensor merging: 3 layers [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer quantization epilogue pass [10/14/2023-15:37:08] [V] [TRT] QDQ optimization pass [10/14/2023-15:37:08] [V] [TRT] QDQ graph optimizer constant fold dangling QDQ pass [10/14/2023-15:37:08] [V] [TRT] Running: QDQToCopy on node_of_input.q [10/14/2023-15:37:08] [V] [TRT] Swap the layer type of node_of_input.q from QUANTIZE to kQDQ [10/14/2023-15:37:08] [V] [TRT] Running: QDQToCopy on node_of_output [10/14/2023-15:37:08] [V] [TRT] Swap the layer type of node_of_output from DEQUANTIZE to kQDQ [10/14/2023-15:37:08] [V] [TRT] After dupe layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After final dead-layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After tensor merging: 3 layers [10/14/2023-15:37:08] [V] [TRT] After vertical fusions: 3 layers [10/14/2023-15:37:08] [V] [TRT] After dupe layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After final dead-layer removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After tensor merging: 3 layers [10/14/2023-15:37:08] [V] [TRT] After slice removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] After concat removal: 3 layers [10/14/2023-15:37:08] [V] [TRT] Trying to split Reshape and strided tensor [10/14/2023-15:37:08] [V] [TRT] Graph construction and optimization completed in 0.0159258 seconds. [10/14/2023-15:37:08] [I] [TRT] ---------- Layers Running on DLA ---------- [10/14/2023-15:37:08] [I] [TRT] ---------- Layers Running on GPU ---------- [10/14/2023-15:37:08] [I] [TRT] [GpuLayer] COPY: node_of_input.q [10/14/2023-15:37:08] [I] [TRT] [GpuLayer] CONVOLUTION: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) [10/14/2023-15:37:08] [I] [TRT] [GpuLayer] COPY: node_of_output [10/14/2023-15:37:08] [V] [TRT] Trying to load shared library libcublas.so.11 [10/14/2023-15:37:08] [V] [TRT] Loaded shared library libcublas.so.11 [10/14/2023-15:37:12] [V] [TRT] Using cublas as plugin tactic source [10/14/2023-15:37:12] [V] [TRT] Trying to load shared library libcublasLt.so.11 [10/14/2023-15:37:12] [V] [TRT] Loaded shared library libcublasLt.so.11 [10/14/2023-15:37:12] [V] [TRT] Using cublasLt as core library tactic source [10/14/2023-15:37:12] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +534, GPU +1002, now: CPU 1108, GPU 21303 (MiB) [10/14/2023-15:37:12] [V] [TRT] Trying to load shared library libcudnn.so.8 [10/14/2023-15:37:12] [V] [TRT] Loaded shared library libcudnn.so.8 [10/14/2023-15:37:12] [V] [TRT] Using cuDNN as plugin tactic source [10/14/2023-15:37:12] [V] [TRT] Using cuDNN as core library tactic source [10/14/2023-15:37:12] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +83, GPU +66, now: CPU 1191, GPU 21369 (MiB) [10/14/2023-15:37:12] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [10/14/2023-15:37:12] [V] [TRT] Constructing optimization profile number 0 [1/1]. [10/14/2023-15:37:12] [V] [TRT] Reserving memory for host IO tensors. Host: 0 bytes [10/14/2023-15:37:12] [V] [TRT] =============== Computing reformatting costs: [10/14/2023-15:37:12] [V] [TRT] *************** Autotuning Reformat: Float(2359296,16384,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:12] [V] [TRT] --------------- Timing Runner: node_of_input.q (Reformat) [10/14/2023-15:37:12] [V] [TRT] Tactic: 0x00000000000003e8 Time: 0.65739 [10/14/2023-15:37:12] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.674213 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.663886 [10/14/2023-15:37:13] [V] [TRT] Fastest Tactic: 0x00000000000003e8 Time: 0.65739 [10/14/2023-15:37:13] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003e8 [10/14/2023-15:37:13] [V] [TRT] *************** Autotuning Reformat: Float(2359296,16384,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:13] [V] [TRT] --------------- Timing Runner: node_of_input.q (Reformat) [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.14129 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.685362 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.63509 [10/14/2023-15:37:13] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.63509 [10/14/2023-15:37:13] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x0000000000000000 [10/14/2023-15:37:13] [V] [TRT] *************** Autotuning Reformat: Float(2359296,16384,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:13] [V] [TRT] --------------- Timing Runner: node_of_input.q (Reformat) [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003e8 Time: 3.19855 [10/14/2023-15:37:13] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.685934 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 3.21021 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.685934 [10/14/2023-15:37:14] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003ea [10/14/2023-15:37:14] [V] [TRT] =============== Computing reformatting costs: [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(2359296,16384,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.84465 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.712402 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.83695 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.712402 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(2359296,16384,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 2.80111 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.680265 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 2.81543 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.680265 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(589824,16384:4,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.80152 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.78608 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.778322 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.778322 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(589824,16384:4,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 2.18206 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.787214 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.678245 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.678245 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(81920,16384:32,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.8455 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 2.55205 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.83573 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 1.83573 [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(81920,16384:32,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: Optimizer Reformat(input.q -> ) (Reformat) [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003e8 Time: 0.636507 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.771067 [10/14/2023-15:37:14] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.124219 [10/14/2023-15:37:14] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.124219 [10/14/2023-15:37:14] [V] [TRT] =============== Computing reformatting costs: [10/14/2023-15:37:14] [V] [TRT] *************** Autotuning Reformat: Int8(2359296,16384,128,1) -> Float(2359296,16384,128,1) *************** [10/14/2023-15:37:14] [V] [TRT] --------------- Timing Runner: node_of_output (Reformat) [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003e8 Time: 0.411502 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.690999 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.716251 [10/14/2023-15:37:15] [V] [TRT] Fastest Tactic: 0x00000000000003e8 Time: 0.411502 [10/14/2023-15:37:15] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003e8 [10/14/2023-15:37:15] [V] [TRT] *************** Autotuning Reformat: Int8(589824,16384:4,128,1) -> Float(2359296,16384,128,1) *************** [10/14/2023-15:37:15] [V] [TRT] --------------- Timing Runner: node_of_output (Reformat) [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.36625 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.813559 [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x0000000000000000 Time: 0.725362 [10/14/2023-15:37:15] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 0.725362 [10/14/2023-15:37:15] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x0000000000000000 [10/14/2023-15:37:15] [V] [TRT] *************** Autotuning Reformat: Int8(81920,16384:32,128,1) -> Float(2359296,16384,128,1) *************** [10/14/2023-15:37:15] [V] [TRT] --------------- Timing Runner: node_of_output (Reformat) [10/14/2023-15:37:15] [V] [TRT] Tactic: 0x00000000000003e8 Time: 1.39012 [10/14/2023-15:37:16] [V] [TRT] Tactic: 0x00000000000003ea Time: 0.784887 [10/14/2023-15:37:16] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.37678 [10/14/2023-15:37:16] [V] [TRT] Fastest Tactic: 0x00000000000003ea Time: 0.784887 [10/14/2023-15:37:16] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Reformat Tactic: 0x00000000000003ea [10/14/2023-15:37:16] [V] [TRT] =============== Computing costs for [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(2359296,16384,128,1) -> Int8(2359296,16384,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(589824,16384:4,128,1) -> Int8(589824,16384:4,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CudaDepthwiseConvolution) [10/14/2023-15:37:16] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(589824,16384:4,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] *************** Autotuning format combination: Int8(81920,16384:32,128,1) -> Int8(81920,16384:32,128,1) *************** [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CudaGroupConvolution) [10/14/2023-15:37:16] [V] [TRT] CudaGroupConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CudaDepthwiseConvolution) [10/14/2023-15:37:16] [V] [TRT] CudaDepthwiseConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] --------------- Timing Runner: w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul) (CaskFlattenConvolution) [10/14/2023-15:37:16] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping [10/14/2023-15:37:16] [V] [TRT] Deleting timing cache: 6 entries, served 0 hits since creation. [10/14/2023-15:37:16] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node w + node_of_dqconv.dwconv + PWN(node_of_swish.sigmoid, node_of_swish.mul).) [10/14/2023-15:37:16] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) [10/14/2023-15:37:16] [E] Engine could not be created from network [10/14/2023-15:37:16] [E] Building engine failed [10/14/2023-15:37:16] [E] Failed to create engine from model or file. [10/14/2023-15:37:16] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx --minShapes=input:1x144x128x128 --optShapes=input:4x144x128x128 --maxShapes=input:32x144x128x128 --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json ```

Steps To Reproduce

Commands or scripts:

I made a code which makes minimal reproducible model, so

generate ONNX file
try to convert (error occurred)

# generate ONNX file
$ python3 gen_qdq_dw_bn_swish.py

# try to convert
$ /usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx "--minShapes=input:1x144x128x128" "--optShapes=input:4x144x128x128" "--maxShapes=input:32x144x128x128" --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json

generation code (gen_qdq_dw_bn_swish.py)

```python # gen_qdq_dw_bn_swish.py import onnx import numpy as np import onnx.numpy_helper import onnxruntime as ort def get_qdq_dw_bn_swish_model(): inputs = [onnx.helper.make_tensor_value_info('input', onnx.TensorProto.FLOAT, ['batch_size', 144, 128, 128])] outputs = [onnx.helper.make_tensor_value_info('output', onnx.TensorProto.FLOAT, ['batch_size', 144, 128, 128])] nodes = [ # input Q onnx.helper.make_node('QuantizeLinear', ['input', 'x_scale', 'x_zero_point'], ['input.q']), # DQ DW-Conv onnx.helper.make_node('DequantizeLinear', ['input.q', 'x_scale', 'x_zero_point'], ['dqconv.dq_feature']), onnx.helper.make_node('DequantizeLinear', ['w', 'w_scale', 'w_zero_point'], ['dqconv.dq_weight'], axis=0), onnx.helper.make_node('Conv', ['dqconv.dq_feature', 'dqconv.dq_weight'], ['dqconv.dwconv'], pads=[1, 1, 1, 1], kernel_shape=[3, 3], group=144), # BN onnx.helper.make_node('BatchNormalization', ['dqconv.dwconv', 'bn.scale', 'bn.B', 'bn.mean', 'bn.var'], ['bn']), # Swish onnx.helper.make_node('Sigmoid', ['bn'], ['swish.sigmoid']), onnx.helper.make_node('Mul', ['swish.sigmoid', 'bn'], ['swish.mul']), # Q onnx.helper.make_node('QuantizeLinear', ['swish.mul', 'y_scale', 'y_zero_point'], ['qdqconv.q']), # output DQ onnx.helper.make_node('DequantizeLinear', ['qdqconv.q', 'y_scale', 'y_zero_point'], ['output']), ] inits = [ # DQ onnx.numpy_helper.from_array(np.array(0.47191643714904785, dtype=np.float32), 'x_scale'), onnx.numpy_helper.from_array(np.array(0, dtype=np.int8), 'x_zero_point'), # W onnx.numpy_helper.from_array(np.ones([144, 1, 3, 3], dtype=np.int8) * int(-0.14677785336971283 / 0.005895098205655813), 'w'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 0.005895098205655813, 'w_scale'), onnx.numpy_helper.from_array(np.zeros([144], dtype=np.int8), 'w_zero_point'), # BN onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 1.507541537284851, 'bn.scale'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 0.10940893739461899, 'bn.B'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * -0.5512617826461792, 'bn.mean'), onnx.numpy_helper.from_array(np.ones([144], dtype=np.float32) * 1.0475233793258667, 'bn.var'), # Q onnx.numpy_helper.from_array(np.array(0.032607611268758774, dtype=np.float32), 'y_scale'), onnx.numpy_helper.from_array(np.array(0, dtype=np.int8), 'y_zero_point'), ] model = onnx.helper.make_model(onnx.helper.make_graph(nodes, 'pw', inputs, outputs, inits), opset_imports=[onnx.helper.make_opsetid('', 13)], ir_version=7) return model if __name__ == '__main__': print('***** DW-Conv > BN > Swish *****') model = get_qdq_dw_bn_swish_model() opts = ort.SessionOptions() sess = ort.InferenceSession(model.SerializeToString(), opts, ['CPUExecutionProvider']) res = sess.run(None, {'input': np.ones([32, 144, 128, 128], dtype=np.float32)}) assert res[0].shape == (32, 144, 128, 128) onnx.save(model, 'qdq_dw_bn_swish.onnx') print('*** to reproduce error, run following command ***') print('/usr/src/tensorrt/bin/trtexec --onnx=qdq_dw_bn_swish.onnx "--minShapes=input:1x144x128x128" "--optShapes=input:4x144x128x128" "--maxShapes=input:32x144x128x128" --verbose --buildOnly --saveEngine=qdq_dw_bn_swish.plan --int8 --profilingVerbosity=detailed --exportLayerInfo=qdq_dw_bn_swish_layer.json') ```

Have you tried the latest release?: I tried by the latest JetPack environment.

Can this model run on other frameworks? Yes. I can run this model on ONNXRuntime.

NVIDIA / TensorRT