INT8 implementation reference - Githubissues

AlexeyAB / yolo2_light

Light version of convolutional neural network Yolo v3 & v2 for objects detection with a minimum of dependencies (INT8-inference, BIT1-XNOR-inference)

MIT License

302 stars 116 forks source link

INT8 implementation reference #27

Open trustin77 opened 5 years ago

trustin77 commented 5 years ago

Hi, @AlexeyAB

I'd like to know more about how INT8 version is implemented. Is it based on one/more papers? Could you give related links for reference?

Thanks

AlexeyAB commented 5 years ago

@trustin77 Hi,

I have not seen step by step instructions on how to do this. I used these documentations:

How Float-32 is converted to the INT-8 in the TensorRT: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
How to use CUDNN_DATA_INT8x4 in cuDNN: https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnConvolutionForward
How to convert CUDNN_TENSOR_NCHW & INT8 to CUDNN_TENSOR_NCHW_VECT_C & INT8x4: https://devtalk.nvidia.com/default/topic/1028139/cudnn/how-to-reduce-time-spent-in-transforming-tensors-using-cudnnv6-0-for-api-cudnntransformtensor-/post/5264978/#5264978

About optimzal input_calibration: https://github.com/AlexeyAB/yolo2_light/issues/24#issuecomment-435361415

Also about quantization:

Yolo v2 INT8 - too high a reduction of accuracy: http://cs231n.stanford.edu/reports/2017/pdfs/808.pdf
optimal quantization is INT 4-bit: https://arxiv.org/abs/1510.00149
XNOR BIT1 quantization - This motivates us to avoid binarization at the first and last layer of a CNN: https://arxiv.org/abs/1603.05279
MobileNet quantization: https://arxiv.org/abs/1712.05877
Quantization of old models: https://arxiv.org/abs/1512.06473
About XNOR: https://arxiv.org/abs/1807.03010
Also about XNOR: https://arxiv.org/abs/1803.05849