PTQ is faster than QAT - Githubissues

pangr commented 2 years ago

Description

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: 1080ti NVIDIA Driver Version: 450 CUDA Version: 11.0 CUDNN Version: 8.1.0 Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

When I use PTQ, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Int8': But when I use QAT, 'PWN(PWN(Sigmoid, Mul), Add)' is 'Int8' -> 'Float32':

Int8 onnx is:

zerollzeng commented 2 years ago

@ttyio Do you have any recommendations on the QDQ placement here? I think the user can fine-tune it to get better performance.

ttyio commented 2 years ago

@pangr , what's the op after the add, also have you tried insert Q/DQ after the add? thanks

ttyio commented 2 years ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

NVIDIA / TensorRT

PTQ is faster than QAT #2204

Description

Environment

Relevant Files

Steps To Reproduce