Open laclouis5 opened 4 years ago
The main changes why do they get high accuracy:
We use a calibration set of 1000 images, randomly sampled from our training set.
Are used only 6 classes from 80 (MS COCO), custom network size 608x352, and custom anchors
Using only layers supported by TenorRT: ReLU and custom-upsample layer, to avoid FP32->INT8->FP32 conversions
Or just use ReLU instead of Leaky-ReLU
Or use Leaky_ReLU = 2 scale-layers + ReLU + shortcut-layer:
if (x >= 0) out = x*a + x*(1-a) = x
if (x < 0) out = x*a + x*0 = x*a
@laclouis5
You can change https://github.com/AlexeyAB/darknet/blob/63396082d7e77f4b460bdb2540469f5f1a3c7c48/cfg/yolov3-spp.cfg model
set width=608 height=352 https://github.com/AlexeyAB/darknet/blob/63396082d7e77f4b460bdb2540469f5f1a3c7c48/cfg/yolov3-spp.cfg#L8-L9
replace all activation=leaky
to activation=relu
in cfg-file
extract from MS COCO dataset only 6 classes: person, car, bicycle, motorbike, bus, truck
train this cfg-file by using this repository https://github.com/AlexeyAB/darknet
quantize and run this model on TenorRT: https://news.developer.nvidia.com/deepstream-sdk-4-now-available/
You should get approximately the same result.
@AlexeyAB Ok thanks, so changing to relu
, training and then quantize to TensorRT should improve network latency for a small accuracy drop?
@laclouis5 Yes. You will get the same relatrive improvement relative to default yolov3.cfg (Leaky & FP32).
To get absoule speed/accuracy as in paper:
width=608 height=352
for training - it will be 2x faster than 608x608.@AlexeyAB What do you mean by quantize and run this model on TenorRT: https://news.developer.nvidia.com/deepstream-sdk-4-now-available/
@AlexeyAB set width=608 height=352 for training - it will affect the accuracy?
Hi, I'm currently running yolo3-tiny on xavier. My inputsize is 576*352 I already converted yolo to tensorRT. As JetNet's parer mentioned we can achieve 60% speed up if we alter LeakeyRelu to Relu. However, I don't see any difference between them in my tests. In my case the speed for yolo-tiny at fp16 is about 13ms and int8 is about 10ms. As I know the yolo3-tiny is several times faster than yolo3 which means yolo-tiny should be at about 3-6ms in tensorRT at int8. Are there anyone seeing the same problem? Any help in speeding up yolo-tiny in tensorRT is welcomed. Thanks!
Hi, I'm currently running yolo3-tiny on xavier. My inputsize is 576*352 I already converted yolo to tensorRT. As JetNet's parer mentioned we can achieve 60% speed up if we alter LeakeyRelu to Relu. However, I don't see any difference between them in my tests. In my case the speed for yolo-tiny at fp16 is about 13ms and int8 is about 10ms. As I know the yolo3-tiny is several times faster than yolo3 which means yolo-tiny should be at about 3-6ms in tensorRT at int8. Are there anyone seeing the same problem? Any help in speeding up yolo-tiny in tensorRT is welcomed. Thanks!
Hi, how did you converted yolo to tensorRT please ?
@Kmarconi Hi, I refered to this repo https://github.com/lewes6369/TensorRT-Yolov3. Basically it converts the darknet model to caffe and use TesnsorRt to parse the caffe model.
Jetnet proposes a very similar implementation of what is done in this repo. It tries to achieve low latency using Yolo V3 on Nvidia embedded platforms (TX2 and Jetson) using TensorRT optimisations.
Results on Visdrone:
Is the speed similar the the one achieved in this repo?