Is QAT (quantization-aware training) model obtained by pytorch-quantization compatible with DLA (Deep Learning Accelerator) ?

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.84k stars 2.14k forks source link

Is QAT (quantization-aware training) model obtained by pytorch-quantization compatible with DLA (Deep Learning Accelerator) ? #2222

Closed WeixiangXu closed 2 years ago

WeixiangXu commented 2 years ago

I am tring to obtain a QAT model with official pytorch-quantization.

However, I notice that (1) the quantization function used in DLA is 'roundWithTiesAwayFromZero' while it is 'roundWithTiesToEven' in pytorch-quantization. (2) DLA only supports 'IInt8EntropyCalibrator2' in PTQ.

My question is whether QAT model obtained by pytorch-quantization compatible with DLA?

Thanks!

zerollzeng commented 2 years ago

Currently, DLA doesn't support QAT yet, but there is some work we are doing on this.

zerollzeng commented 2 years ago

@pranavm-nvidia for viz

WeixiangXu commented 2 years ago

Is the reason why DLA does not support QAT that DLA does not support explicit quantization?

If I manually convert the explicit the explicit quantization into implicit quantization by manually merging the Q and DQ nodes in onnx graph, can it be supported by DLA?

Thanks.

pranavm-nvidia commented 2 years ago

Yes, if you convert it to implicit quantization, it would work with DLA.

WeixiangXu commented 2 years ago

Thanks for your reply!

For DLA deployment, besides implicit quantization, are there any other changes needed when using pytorch-quantization? (e.g. quantization function from roundWithTiesToEven to roundWithTiesAwayFromZero?) @pranavm-nvidia

pranavm-nvidia commented 2 years ago

There might be some accuracy degradation due to the rounding differences, but I imagine it would be small or negligible.