NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.76k stars 2.13k forks source link

PTQ is faster than QAT #2204

Closed pangr closed 2 years ago

pangr commented 2 years ago

Description

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: 1080ti NVIDIA Driver Version: 450 CUDA Version: 11.0 CUDNN Version: 8.1.0 Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

When I use PTQ, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Int8': image But when I use QAT, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Float32': image

Int8 onnx is: image

zerollzeng commented 2 years ago

@ttyio Do you have any recommendations on the QDQ placement here? I think the user can fine-tune it to get better performance.

ttyio commented 2 years ago

@pangr , what's the op after the add, also have you tried insert Q/DQ after the add? thanks

ttyio commented 2 years ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!