Darknet weights quantization

phongnhhn92 commented 7 years ago

Hello guys, as you know tensorflow has several methods for model quantization after finish training in order to reduce size with a bit of accuracy trade off. I am wondering is there any similar features like that with darknet weights file ?

AlexeyAB commented 7 years ago

@phongnhhn92 Hi,

There isn't model quantization in the Darknet Yolo. Yolo always uses float 32-bit, so Yolo can't use INT or FLOAT with lesser number of bits. This is hardcoded for processing on CPU and GPU : float *A, int lda, float *B, int ldb, float BETA, float *C, int ldc

Darknet try to use xyolo.cfg XNOR binary 1-bit network for mobile devices - XNOR-Net is 58x faster than Yolo: https://github.com/AlexeyAB/darknet/issues/38#issuecomment-285413313
But Yann LeCun considers that, for complex tasks need a minimum of 8-bits: But to get good results on a task like ImageNet you need about 8 bit of precision on the neuron states.: https://www.facebook.com/yann.lecun/posts/10152184295832143
And in this article showed that using less than 4 bit does not have the optimal ratio of speed and accuracy, at least for models Alexnet and VGG16: https://arxiv.org/pdf/1510.00149.pdf

quant

You are right, model quantization really could increase speed without decreasing of accuracy:

At least cuDNN 6.0 can use INT8, this is 4x faster than FLOAT32: https://github.com/AlexeyAB/darknet/issues/61#issuecomment-293644312
Also Pruning (fewer connections), traind Quantinization (fewer bits) and Huffman encoding can decrease model size 50x times without decreasing of accuracy: https://arxiv.org/pdf/1510.00149.pdf

compress_accuracy

quant_pruning

spinoza1791 commented 6 years ago

Quantization has been tried here (http://cs231n.stanford.edu/reports/2017/pdfs/808.pdf) and they found it unsuccessfully slow during inference, "...because the model needs to re-evaluate the maximum and minimum value of the input at each layer, which typically slows the model unless the hardware itself can be bit-optimized". I believe this bit-optimizing can be accomplished via custom FPGA programming, as demonstrated at CES 2017 for the Raspberry Pi to reach 16 FPS using Tiny-YOLO.

AlexeyAB commented 6 years ago

@spinoza1791 Thanks for URL. Yes, tiny-yolo-8bit has less accuracy and less speed than tiny-yolo-fp32 on the Raspberry Pi 3. But on the FPGA it should work 4x faster using INT8 than using FP32.

I have not read it in its entirety, but I didn't find about it in the article - whether they used merged layers (conv + batch-norm) into a single convolution layer?

As known, after batch-normalization we have distribution that can be contained in int8 with a lower loss of accuracy than distribution after the convolution layer before batch-normalization. So probably convolution output (batch-norm input) is a bottleneck of accuracy for int8.

Left - after conv-layer but before batch-norm Right - after batch-norm batchnorm 2x

It also makes sense to try to port Yolo to the TensorRT 3 and use a calibration for quantization with a minimum loss of accuracy.

AlexeyAB commented 6 years ago

@phongnhhn92 @spinoza1791

I implemented INT8 quantization for Yolo v2 and v3 in this repo: https://github.com/AlexeyAB/yolo2_light

COCO:

Yolov3 mAP=53.6% / 54.36% (1.3x acceleration)
Yolov3-tiny mAP=31.58% / 33.99% (1.17x acceleration)

VOC:

Yolov2 mAP=74.93% / 75.80% (1.7x acceleration)
Yolov2-tiny mAP=52.64% / 57.10% (1.5x acceleration)

More: https://github.com/AlexeyAB/darknet/issues/726#issuecomment-409983119

phongnhhn92 commented 6 years ago

@AlexeyAB Thanks ! Keep up the good works ^^

phongnhhn92 commented 6 years ago

@AlexeyAB hello, I did some testing the new project YOLO2_light on my Jetson Tx2 and the detection speed is about 0.004 s /img and ~25 fps on mp4 video. I think it is quite impressive. Is there anyway to build this yolo2_light project with so,dll library so that I can use that to develop app using QT or Visual Studio. I hope it would be nice to have it. Thanks !

AlexeyAB commented 6 years ago

@phongnhhn92 Hi,

There is no plans to do yolo2_light as SO/DLL library, but you can just add source files to your project from: https://github.com/AlexeyAB/yolo2_light/tree/master/src (just wihtout OpenCL files)

Also I implemented XNOR-net on CPU (may be latter it will be on GPU) in this repo: https://github.com/AlexeyAB/darknet

So tiny-yolo-obj_xnor.zip more than 3x times faster than yolov2-tiny.cfg. But you should train your own model by using tiny-yolo-obj_xnor.cfg from this zip-archive: tiny-yolo-obj_xnor.zip

dexception commented 5 years ago

@AlexeyAB Is there a model for Yolov3-XNOR ?

ReekiLee commented 4 years ago

Hello, I have trained a yolov2-tiny model, and I want to know how can I get a post-training 8bit quantization .weights file?

AlexeyAB commented 4 years ago

@ReekiLee Use this repo and read readme: https://github.com/AlexeyAB/yolo2_light

ReekiLee commented 4 years ago

@ReekiLee Use this repo and read readme: https://github.com/AlexeyAB/yolo2_light Thanks for your quick reply~ I tried that inference, but I‘m wondering is there any code or methods to save the int8 weights? In addition, if I change all the files in darknet from float into int, then training, can i get an int8 weights? (sorry for my silly question.....,,ԾㅂԾ,,)

AlexeyAB commented 4 years ago

I tried that inference, but I‘m wondering is there any code or methods to save the int8 weights?

It isn't implemented. You should implement it by yourself in C.

In addition, if I change all the files in darknet from float into int, then training, can i get an int8 weights?

No.

AlexeyAB / darknet

Darknet weights quantization #138