Closed WeixiangXu closed 2 years ago
Currently, DLA doesn't support QAT yet, but there is some work we are doing on this.
@pranavm-nvidia for viz
Is the reason why DLA does not support QAT that DLA does not support explicit quantization?
If I manually convert the explicit the explicit quantization into implicit quantization by manually merging the Q and DQ nodes in onnx graph, can it be supported by DLA?
Thanks.
Yes, if you convert it to implicit quantization, it would work with DLA.
Thanks for your reply!
For DLA deployment, besides implicit quantization, are there any other changes needed when using pytorch-quantization? (e.g. quantization function from roundWithTiesToEven to roundWithTiesAwayFromZero?) @pranavm-nvidia
There might be some accuracy degradation due to the rounding differences, but I imagine it would be small or negligible.
I am tring to obtain a QAT model with official pytorch-quantization.
However, I notice that (1) the quantization function used in DLA is 'roundWithTiesAwayFromZero' while it is 'roundWithTiesToEven' in pytorch-quantization. (2) DLA only supports 'IInt8EntropyCalibrator2' in PTQ.
My question is whether QAT model obtained by pytorch-quantization compatible with DLA?
Thanks!