Status of yolo2_light - Githubissues

tlind commented 6 years ago

Asking here because issue tracking is disabled in the other project: I'd like to use yolo2_light for a robotics research project because it's more light-weight, and I am wondering if its inference performance is on par with this fork of darknet (using yolov3.cfg). Also, what's the license?

AlexeyAB commented 6 years ago

About: https://github.com/AlexeyAB/yolo2_light It looks like this is MIT license: https://github.com/AlexeyAB/yolo2_light/blob/master/LICENSE

I am wondering if its inference performance is on par with this fork of darknet (using yolov3.cfg).

In general yes, except:

yolo2_light - doesn't use Tensor Cores on Volta GPUs (Titan V, Tesla V100) yet. This repository https://github.com/AlexeyAB/darknet use it: https://github.com/AlexeyAB/darknet/issues/407
yolo2_light - can quantize Float-32 to Int-8, with about +30% performance with only -1% precision mAP reducing by using DP4A-Cores on GPU Pascal (Titan X, 1080, Tesla P100, ...), just use flag -quantized at the end of command, you can use any cfg/weights file already trained by using Float-32. This repo https://github.com/AlexeyAB/darknet can't do it.

Also:

yolo2_light - I just added XNOR-net (weights, inputs, calculations: bit-1 instead of float-32) on CPU, in the same way as current repository https://github.com/AlexeyAB/darknet It is about 4x times acceleration on CPU AVX2 (will be improved), but about -30% precision mAP. Model should be trained by using such cfg-file: https://github.com/AlexeyAB/yolo2_light/blob/master/bin/tiny-yolo-obj_xnor.cfg

yolo2_light supports Yolo v2 and v3 models.

Also there are difference in commands - specify names-file instead of data-file:

This repo: ./darknet detector demo voc.data yolo-voc.cfg yolo-voc.weights test.mp4
yolo2_light ./darknet detector demo voc.names yolo-voc.cfg yolo-voc.weights test.mp4

For quantization - calculation on Int-8 instead of Float-32, ~+30% speedup and -1% mAP: https://github.com/AlexeyAB/yolo2_light

image: ./darknet detector test voc.names yolo-voc.cfg yolo-voc.weights dog.jpg -quantized
video: ./darknet detector demo voc.names yolo-voc.cfg yolo-voc.weights test.mp4 - quantized

mathieuorhan commented 6 years ago

Very interested in this project too, as I need to run my detector on CPU/small GPU (for autonomous driving). In the original paper of XNOR-Net, the authors achieve a huge speed increase for a small decrease in accuracy. Do you think there is a big room for improvement of the current yolo_light, in both 8bits and 1bit quantization ?

AlexeyAB commented 6 years ago

@mathieuorhan For further optimizations:

int8 - only 30% acceleration - little space for further optimizations. For example, mixed-precision FP32/16 on Tensor Cores on Volta GPU is 2x faster without accuracy reducing: https://github.com/AlexeyAB/darknet/issues/407
bit1 (xnor-net) - a very promising approach, acceleration up to ~5800%.

Also:

I tested int8 on Yolo: tiny-v2/v3 and full-v2/v3, accuracy of tiny is dropped ~10% mAP, but accuracy of full is dropped only ~1% mAP (~10x times less dropping on full-models).
I tested xnor-bit1 on tiny only, and accuracy of tiny is dropped ~30% mAP. But we can excpect less accuracy dropping for full-v2/v3 - about ~3-10%, with acceleration 500-5000%. Also some tweaks may be can reduce dropping of accuracy to 1-3%.

The general line for achievement optimal accuracy+speed of CNN is:

more layers
more connections (residual, concat/route, ...)
less filters in each layer
smaller size of filters (3x3 instead of 5x5 - 11x11)
less bits in the weights (1-bit instead of 32-bit float)

So better to use 320 layers with bit-1 weights instead of 10 layers with float-32 weights, accuracy and speed will be higher, with the same model size.

Also it looks like that Yann LeCun's statement is not true for modern networks with many layers and connections. Since he wrote it for old networks with low number of layers.

But Yann LeCun considers that, for complex tasks need a minimum of 8-bits: But to get good results on a task like ImageNet you need about 8 bit of precision on the neuron states.: https://www.facebook.com/yann.lecun/posts/10152184295832143

Somewhere there was talk about 4-bit optimal weights and linear initialization: https://github.com/AlexeyAB/darknet/issues/138

mathieuorhan commented 6 years ago

@AlexeyAB Thank you for the insights, this is very interesting. I'm looking forward for progress and new tests. In the next few weeks I'm going to test different settings with YOLO_light and post feedback.

tlind commented 5 years ago

Great, thanks a lot for the clarification!

AlexeyAB / darknet

Status of yolo2_light #1417