NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
When you switch to Tensort engine, the input of MaxPool and Tile becomes a float, resulting in more Reformat layers. Is there any way to make the input and output of MaxPool and Tile also int8, and remove Reformat? thank you
Environment
TensorRT Version: 8.4.1
NVIDIA GPU: Xavier
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Description
Part of my network structure is as follows,
log when generate engine:
Layer(CaskConvolution): pfn_layers.0.linear.weight + pfe_QuantizeLinear_7 + pfe_MatMul_10, Tactic: 0xc9c0872569525a26, 21[Int8(1,10,8000,20)] -> 35[Float(1,16,8000,20)] Layer(CudnnPooling): pfe_ReduceMax_15, Tactic: 0xffffffffffffffff, 35[Float(1,16,8000,20)] -> 36[Float(1,16,8000,1)] Layer(Slice): pfe_Tile_28, Tactic: 0x0000000000000000, 36[Float(1,16,8000,1)] -> 59[Float(1,16,8000,20)] Layer(Reformat): pfe_QuantizeLinear_32_clone_1, Tactic: 0x0000000000000000, 59[Float(1,16,8000,20)] -> 63[Int8(1,16,8000,20)] Layer(Reformat): pfe_QuantizeLinear_32_clone_0, Tactic: 0x0000000000000000, 35[Float(1,16,8000,20)] -> pfe_Concat_29_35_clone_0[Int8(1,16,8000,20)] Layer(Reformat): 35 copy, Tactic: 0x00000000000003e8, pfe_Concat_29_35_clone_0[Int8(1,16,8000,20)] -> 63[Int8(1,16,8000,20)]
When you switch to Tensort engine, the input of MaxPool and Tile becomes a float, resulting in more Reformat layers. Is there any way to make the input and output of MaxPool and Tile also int8, and remove Reformat? thank you
Environment
TensorRT Version: 8.4.1 NVIDIA GPU: Xavier NVIDIA Driver Version: CUDA Version: CUDNN Version: Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce