NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.73k stars 2.12k forks source link

A question about int8 explicit quantization for plugins #1616

Closed xxueniao closed 2 years ago

xxueniao commented 2 years ago

Description

I wrote a custom plugins to support int8 input. At the same time, I turned on explicit quantization, which means that I can no longer perform PTQ and the output of the layer before this plugin cannot be int8 either. Because this layer cannot be fused with q/dq (e.g. a resize layer). So, how can I make the input of this plugin to be int8 in explicit quantization mode? Do I need to add a Q layer in front of the plugin?

Environment

TensorRT Version: 8.0.0.4 NVIDIA GPU: 3080 NVIDIA Driver Version: 460.32.03 CUDA Version: 11.2 CUDNN Version: 8.1 Operating System: ubuntu16.04 Python Version (if applicable): 3.8 Tensorflow Version (if applicable): 2.4.3 PyTorch Version (if applicable): Baremetal or Container (if so, version):

ttyio commented 2 years ago

Hello @xxueniao , could you try insert Q/DQ pair both before the plugin and after the plugin, after convert to ONNX, using graph surgeon to modify the graph like this:

image

Thanks!

ttyio commented 2 years ago

close since no activity for more than 3 weeks, please reopen if you still have question, thanks!