NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.55k stars 2.1k forks source link

Parallel Convolution won't merge in tensor merge step while using QDQ method #1602

Closed imyhxy closed 2 years ago

imyhxy commented 2 years ago

Description

Hi there, recently I was using pytorch_quantization to quantize my detection model, and I found out that there is a different processing step between native TensorRT INT8 PTQ and the QDQ method.

When there are two convolution layers with the same kernel size operate on the same input tensor, the native TensorRT INT8 PTQ would merge those convolution layers into single one convolution layer. Like the following pattern:

[11/09/2021-09:27:00] [I]                                                                         Conv_41 + Relu_42       51.24           0.0518      2.3
[11/09/2021-09:27:00] [I]                                                                         Conv_43 + Relu_44       31.66           0.0320      1.4
[11/09/2021-09:27:00] [I]                                                    Conv_45 + Relu_46 || Conv_52 + Relu_53       16.48           0.0167      0.7
[11/09/2021-09:27:00] [I]                                                                         Conv_47 + Relu_48       21.68           0.0219      1.0
[11/09/2021-09:27:00] [I]                                                                         Conv_49 + Relu_50       26.52           0.0268      1.2
[11/09/2021-09:27:00] [I]                                                                               PWN(Add_51)       18.97           0.0192      0.8
[11/09/2021-09:27:00] [I]                                                                                  180 copy       13.16           0.0133      0.6
[11/09/2021-09:27:00] [I]                                                                         Conv_55 + Relu_56       16.97           0.0171      0.8

Compare to above, the QDQ method won't triger this tensor method even thought I use the same TensorQuantize node for both branch, the output is following:

[11/09/2021-10:36:18] [I]                                              model.0.conv.conv.weight + QuantizeLinear_48_quantize_scale_node + Conv_50 + Relu_51       48.47           0.0496      2.1                                              
[11/09/2021-10:36:18] [I]                                                   model.1.conv.weight + QuantizeLinear_59_quantize_scale_node + Conv_61 + Relu_62       35.20           0.0360      1.5                                              
[11/09/2021-10:36:18] [I]                                               model.2.cv1.conv.weight + QuantizeLinear_70_quantize_scale_node + Conv_72 + Relu_73       14.63           0.0150      0.6                                              
[11/09/2021-10:36:18] [I]                                            model.2.cv2.conv.weight + QuantizeLinear_116_quantize_scale_node + Conv_118 + Relu_119       13.74           0.0141      0.6                                              
[11/09/2021-10:36:18] [I]                                           model.2.m.0.cv1.conv.weight + QuantizeLinear_87_quantize_scale_node + Conv_89 + Relu_90       12.88           0.0132      0.6                                              
[11/09/2021-10:36:18] [I]                                         model.2.m.0.cv2.conv.weight + QuantizeLinear_98_quantize_scale_node + Conv_100 + Relu_101       23.87           0.0244      1.0                                              
[11/09/2021-10:36:18] [I]                                                                                                                      PWN(Add_108)       20.82           0.0213      0.9                                              
[11/09/2021-10:36:18] [I]                                            model.2.cv3.conv.weight + QuantizeLinear_128_quantize_scale_node + Conv_130 + Relu_131       18.03           0.0184      0.8

Environment

TensorRT Version: 8.2.0.6 NVIDIA GPU: T4 NVIDIA Driver Version: 460.32.03 CUDA Version: 11.5 CUDNN Version: 8.3 Operating System: Ubuntu 18.04 Python Version (if applicable): 3.8 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.9.0 Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

imyhxy commented 2 years ago

@ttyio Could you give some advice?

ttyio commented 2 years ago

@imyhxy , could you share with us ONNX file to debug? thanks

imyhxy commented 2 years ago

@ttyio 🌞 The original onnx model and building log: orig.zip The QDQ model and building log: qdq.zip

ttyio commented 2 years ago

Hello @imyhxy , yes you are right horizontal fusion is disabled on QAT network due to some issue, we have tracked this as feature request internally. Sorry for the inconvenience.

imyhxy commented 2 years ago

@ttyio Thanks, wait for the upgrade of TensorRT.🌞

nvpohanh commented 2 years ago

@ttyio What was the bug id? Has this been fixed?

nvpohanh commented 2 years ago

Closing due to >14 days without activity. Please feel free to reopen if the issue still exists. Thanks