increased inference latency for quantized model

amirgholami / ZeroQ

[CVPR'20] ZeroQ: A Novel Zero Shot Quantization Framework

275 stars 56 forks source link

Closed ZongqiangZhang closed 1 year ago

ZongqiangZhang commented 4 years ago

I hava just reproduced the classification on resnet50+imagenet. The accuracy is excellent!

But there is a significant increase in inference latency for quantized model. Test results on resnet + imagenet + tesla t4:

Does anybody hit the same issue ?

zlifd commented 1 year ago

hi, i got the same issue with you, may i know how do you solve the problem? thanks

ZongqiangZhang commented 1 year ago

@zlifd not resolved yet

zlifd commented 1 year ago

@zlifd not resolved yet

Oh, sad to hear that. But I guess the problem is the fake quantization was used in Vitis AI Quantizer, the tensors are not really integers but are represented by floating numbers. Maybe you can take a look from the link below: https://docs.xilinx.com/r/en-US/ug1414-vitis-ai/Configure-the-Quantize-Strategy Thx

ZongqiangZhang commented 1 year ago

@zlifd thx for your comments