Yolo v3 TensorRT Implementation — Super accurate low latency object detection on a surveillance UAV

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.76k stars 7.96k forks source link

Yolo v3 TensorRT Implementation — Super accurate low latency object detection on a surveillance UAV #4538

Open laclouis5 opened 4 years ago

laclouis5 commented 4 years ago

Jetnet proposes a very similar implementation of what is done in this repo. It tries to achieve low latency using Yolo V3 on Nvidia embedded platforms (TX2 and Jetson) using TensorRT optimisations.

Screenshot 2019-12-17 at 14 02 40

Results on Visdrone:

Screenshot 2019-12-17 at 14 11 21

Is the speed similar the the one achieved in this repo?

AlexeyAB commented 4 years ago

The main changes why do they get high accuracy:

Quantization INT8 on TenroRT - with calibration only, my experimental implementation: https://github.com/AlexeyAB/yolo2_light

We use a calibration set of 1000 images, randomly sampled from our training set.

Are used only 6 classes from 80 (MS COCO), custom network size 608x352, and custom anchors
Using only layers supported by TenorRT: ReLU and custom-upsample layer, to avoid FP32->INT8->FP32 conversions

Or just use ReLU instead of Leaky-ReLU

Or use Leaky_ReLU = 2 scale-layers + ReLU + shortcut-layer:

if (x >= 0) out = x*a + x*(1-a) = x
if (x < 0) out = x*a + x*0 = x*a

AlexeyAB commented 4 years ago

@laclouis5

You can change https://github.com/AlexeyAB/darknet/blob/63396082d7e77f4b460bdb2540469f5f1a3c7c48/cfg/yolov3-spp.cfg model

set width=608 height=352 https://github.com/AlexeyAB/darknet/blob/63396082d7e77f4b460bdb2540469f5f1a3c7c48/cfg/yolov3-spp.cfg#L8-L9
replace all activation=leaky to activation=relu in cfg-file
extract from MS COCO dataset only 6 classes: person, car, bicycle, motorbike, bus, truck
train this cfg-file by using this repository https://github.com/AlexeyAB/darknet
quantize and run this model on TenorRT: https://news.developer.nvidia.com/deepstream-sdk-4-now-available/

You should get approximately the same result.

laclouis5 commented 4 years ago

@AlexeyAB Ok thanks, so changing to relu, training and then quantize to TensorRT should improve network latency for a small accuracy drop?

AlexeyAB commented 4 years ago

@laclouis5 Yes. You will get the same relatrive improvement relative to default yolov3.cfg (Leaky & FP32).

To get absoule speed/accuracy as in paper:

Also set width=608 height=352 for training - it will be 2x faster than 608x608.
extract from MS COCO dataset only 6 classes: person, car, bicycle, motorbike, bus, truck - it will improve accuracy 2x

uday60 commented 4 years ago

@AlexeyAB What do you mean by quantize and run this model on TenorRT: https://news.developer.nvidia.com/deepstream-sdk-4-now-available/

ou525 commented 4 years ago

@AlexeyAB set width=608 height=352 for training - it will affect the accuracy?

zeyuDai2018 commented 4 years ago

Hi, I'm currently running yolo3-tiny on xavier. My inputsize is 576*352 I already converted yolo to tensorRT. As JetNet's parer mentioned we can achieve 60% speed up if we alter LeakeyRelu to Relu. However, I don't see any difference between them in my tests. In my case the speed for yolo-tiny at fp16 is about 13ms and int8 is about 10ms. As I know the yolo3-tiny is several times faster than yolo3 which means yolo-tiny should be at about 3-6ms in tensorRT at int8. Are there anyone seeing the same problem? Any help in speeding up yolo-tiny in tensorRT is welcomed. Thanks!

Kmarconi commented 4 years ago

Hi, I'm currently running yolo3-tiny on xavier. My inputsize is 576*352 I already converted yolo to tensorRT. As JetNet's parer mentioned we can achieve 60% speed up if we alter LeakeyRelu to Relu. However, I don't see any difference between them in my tests. In my case the speed for yolo-tiny at fp16 is about 13ms and int8 is about 10ms. As I know the yolo3-tiny is several times faster than yolo3 which means yolo-tiny should be at about 3-6ms in tensorRT at int8. Are there anyone seeing the same problem? Any help in speeding up yolo-tiny in tensorRT is welcomed. Thanks!

Hi, how did you converted yolo to tensorRT please ?

zeyuDai2018 commented 4 years ago

@Kmarconi Hi, I refered to this repo https://github.com/lewes6369/TensorRT-Yolov3. Basically it converts the darknet model to caffe and use TesnsorRt to parse the caffe model.