Low precision inference for Darknet models. Convert model to 16b

Rus-L commented 6 years ago

This fork contains interesting results of model conversion: https://github.com/gplhegde/darknet/tree/master/extra We can expect similar implementation in your fork ? I think it's very helpful to use YOLO3 on the Raspberry PI

AlexeyAB commented 6 years ago

This extension to the Darknet framework aims at providing low precision inference for Darknet models. Currently the underlaying arithmetic is still in floating point. However, the model file can be stored using 8b or 16b format which helps to reduce the model size by a factor of 4 or 2 respectively.

It doesn't speedup detection because arithmetic is still in FP32, but it decreases precision as you see on the charts: https://github.com/gplhegde/darknet/tree/master/extra

It will only reduce size of file yolov2.weights from 194 MB to 49 MB, will decrease precision, but even willn't decrease RAM usage, and willn't increase speed.

Because Raspberry PI 3 GPU doesn't fully support OpenCL (as I know), much faster will be to use Yolo-v2 / Tiny-yolo-v2 on CPU that I added inside OpenCV 3.4.0 dnn-module - examples:

I already added FP16 calcuation for nVidia GPU Volta with Tensor Cores: https://github.com/AlexeyAB/darknet/issues/407#issuecomment-381605047
I will add here INT8-quantinization and INT16/32 forward inference for detection: https://github.com/AlexeyAB/darknet/issues/726#issuecomment-385651416
- for x86_64 AVX instruction
- for nVidia GPU CC >= 6.1 Pascal and higher (with DP4A)

AlexeyAB commented 6 years ago

@Rus-L I added support of INT8 Detection for nVidia GPU CC >= 6.1 Pascal and higher (with DP4A) in this repository: https://github.com/AlexeyAB/yolo2_light

mAP is dropped about -1%
accelerated yolo v2 ~1.5x times, yolo v3 ~1.3x times

Just use such command: ./darknet detector demo voc.names yolo-voc.cfg yolo-voc.weights -thresh 0.24 test.mp4 -quantized yolo_gpu.exe detector demo voc.names yolo-voc.cfg yolo-voc.weights -thresh 0.24 test.mp4 -quantized

or ./darknet detector test voc.names yolo-voc.cfg yolo-voc.weights -thresh 0.24 dog.jpg -quantized yolo_gpu.exe detector test voc.names yolo-voc.cfg yolo-voc.weights -thresh 0.24 dog.jpg -quantized

If you want to use quantization for your custom cfg-file, then you should copy input_callibration params from correspond cfg-file:

AlexeyAB / darknet

Low precision inference for Darknet models. Convert model to 16b #762