NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.87k stars 2.14k forks source link

How to quantify branch nodes containing Maxpool and Tile #2627

Closed Levi-zhan closed 1 year ago

Levi-zhan commented 1 year ago

Description

Part of my network structure is as follows,

YCC9pMhvae

log when generate engine:

Layer(CaskConvolution): pfn_layers.0.linear.weight + pfe_QuantizeLinear_7 + pfe_MatMul_10, Tactic: 0xc9c0872569525a26, 21[Int8(1,10,8000,20)] -> 35[Float(1,16,8000,20)] Layer(CudnnPooling): pfe_ReduceMax_15, Tactic: 0xffffffffffffffff, 35[Float(1,16,8000,20)] -> 36[Float(1,16,8000,1)] Layer(Slice): pfe_Tile_28, Tactic: 0x0000000000000000, 36[Float(1,16,8000,1)] -> 59[Float(1,16,8000,20)] Layer(Reformat): pfe_QuantizeLinear_32_clone_1, Tactic: 0x0000000000000000, 59[Float(1,16,8000,20)] -> 63[Int8(1,16,8000,20)] Layer(Reformat): pfe_QuantizeLinear_32_clone_0, Tactic: 0x0000000000000000, 35[Float(1,16,8000,20)] -> pfe_Concat_29_35_clone_0[Int8(1,16,8000,20)] Layer(Reformat): 35 copy, Tactic: 0x00000000000003e8, pfe_Concat_29_35_clone_0[Int8(1,16,8000,20)] -> 63[Int8(1,16,8000,20)]

When you switch to Tensort engine, the input of MaxPool and Tile becomes a float, resulting in more Reformat layers. Is there any way to make the input and output of MaxPool and Tile also int8, and remove Reformat? thank you

Environment

TensorRT Version: 8.4.1 NVIDIA GPU: Xavier NVIDIA Driver Version: CUDA Version: CUDNN Version: Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

Levi-zhan commented 1 year ago

@ttyio

zerollzeng commented 1 year ago

Missing Q/DQ before the maxpool and tile? @ttyio

ttyio commented 1 year ago

closing since no activitity for more than 3 weeks, thanks!