AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

How does 'route' layer work in yolov2? #487

Open wsyzzz opened 6 years ago

wsyzzz commented 6 years ago

Hi everyone, does anyone know how 'route' layer work in yolov2? I google it and only find that "the route layer is to bring finer grained features in from earlier in the network". So how it 'bring finer grained features in'?

E.g. I provide a case to explain. I use ./darknet detector test cfg/coco.data cfg/yolo.cfg yolo.weights data/dog.jpg


layer        |          filters         |           size          |            input                   |             output           |
0 conv                    32                     3×3  /  1                   416×416×3                ->           416×416×32
……
16  conv                 512                     3×3  /  1                   26×26×256                ->           26×26×512
……
24  conv                1024                     3×3  /  1                   13×13×1024               ->           13×13×1024
25  route                16
26  conv                 64                      1×1  /  1                   26×26×512                ->           26×26×64
27  reorg                                             /  2                   26×26×64                 ->           13×13×256
28  route              27  24
29  conv                1024                     3×3  /  1                   13×13×1280               ->           13×13×1024
……

And in 'yolo.cfg', I found that

[route]                            
layers=-9
……
[route]  
layers=-1,-3 

So It's clear that the 25th route layer uses 16th layer for 'layers=-9' in .cfg file, the 28th route layer uses 27th and 24th layer for 'layers=-1, -3' in .cfg file.

Here is my question. How the route layer how the 25th route layer uses 13×13×1024 input from the 24th layer and 26×26×512 input from the 16th layer to obtain 26×26×512 output(according to the 26th input)?

I am looking for any advice. Thanks for your response!

AlexeyAB commented 6 years ago

Route-layer is the same as concat-lyaer in the Caffe. (When route use only one input - then route-layer is the same as identity-layer in the Caffe)

More: https://github.com/AlexeyAB/darknet/issues/120#issuecomment-313371171

wsyzzz commented 6 years ago

In layer-25, you mean we take the result of layer-16 as output of layer-25 as well as input of layer-26. But how do we deal with the output of layer-24? If we drop it, layer-17-24 will be meaningless. @AlexeyAB

AlexeyAB commented 6 years ago

Layer_27 will concatenate layer_24 + layer_26.

yolo_voc 2 0

wsyzzz commented 6 years ago

So the route-layer and reorg-layer are a module to concatenate current output (like layer-24's output) and previous output(like layer-16's output), which means 'bring finer grained features in from earlier in the network'. And their functions are just like what their names fit. The route-layer is like a route sign and pointing to the layer we want to concatenate. The reorg-layer is actually 'reorganization' layer. Thank you, Alexey! You are the most patient author I've ever met!

Here's another question I'm wondering. My object detection is slow. I use yolo-voc.cfg network 416x416 on your fork and get 9.2s/ one picture on average. And my CPU is eight i7-7700K Core with 39.76 GFlops/computer[1]. Due to some restrictions, I cannot use GPU. According to this issue #80, I should achieve about ~0.01 FPS per 1 GFlops-SP. So I should have gotten 0.3976 FPS and ~2.5s/ one picture. Could you figure out what's the problem? Thanks a lot! [1]CPU performance

AlexeyAB commented 6 years ago

~0.01 FPS per 1 GFlops-SP - only if all CPU-resources are used.

But Darknet Yolo well optimized only for GPU, but not for CPU. I.e. Darknet doesn't use SSE3/4/AVX (SIMD) optimizations, so it slower about 3-4x times than it could be. Did you compile with OPENMP=1 in the Makefile? It will use multi-threads for CPU.


I added Yolo v2 to the OpenCV: https://github.com/opencv/opencv/pull/9705

So if you want to use Yolo on CPU then the fastest way is to use Yolo v2 that built-in OpenCV since 3.4.0 - it can process ~2.5s/ one picture and faster:

wsyzzz commented 6 years ago

Thanks for your advice! And I tried your first solution -- set OPENMP=1. Then I get results in a flash, but it shows it takes 6.2s. But it is much faster than which one shows that it takes 3.2s by my feeling... Does any trouble with the timer?

AlexeyAB commented 6 years ago

Yes, clock_t shows CPU-time instead of real steady time: https://stackoverflow.com/a/10874375/1558037

So it actually works much faster.

wsyzzz commented 6 years ago

Get it! You really help me a lot! I just need to use an another timer function!

wsyzzz commented 6 years ago

@AlexeyAB Do you mean you build two versions of Yolo v2? One is built without OpenCV, the other one is built in OpenCV 3.4, and the latter one is faster? Can the latter one support OpenCV 3.1? Thanks~

AlexeyAB commented 6 years ago

I mean that OpenCV 3.4.0 already contains Yolo v2 for CPU inside OpenCV. So you can just install OpenCV 3.4.0 without installation Darknet - and you can use the fastest version of Yolo v2 for CPU.

Can the latter one support OpenCV 3.1?

Yolo v2 that built-in OpenCV only since 3.4.0.

But this repository you can use with any OpenCV version.

TaihuLight commented 6 years ago

@wsyzzz @AlexeyAB What is the difference in the function and workflow for residual connection between shortcut layer and route layer?

AlexeyAB commented 6 years ago

@TaihuLight

TaihuLight commented 6 years ago

@AlexeyAB Thank you, I think SORT-layer https://github.com/AlexeyAB/darknet/issues/473 can be implemented by changing the functions in [shorcut-]layers as following:

[SORT-shortcut]-layer adds (+) & multiple() the values: 1st input: 1, 2, 3 2nd input: 4, 5, 6 output: 5+sqrt(1 4), 7+sqrt(2 5), 9+sqrt(3 6)

default

AlexeyAB commented 6 years ago

@TaihuLight

If [shortcut] should be calculated


And [SORT-shortcut] should be calculated as: https://arxiv.org/pdf/1703.06993.pdf

TaihuLight commented 6 years ago

@AlexeyAB https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp This is the code of SORT codes implemented in caffe by the author of this paper, I am learning it, and hope that it can help us.

wsyzzz commented 6 years ago

@AlexeyAB Here is a small question while I was using the example of Yolo. I used https://github.com/AlexeyAB/opencv.git and running the yolo_object_detection.cpp. It returned error: 'readNetFromDarknet' was not declared in this scope in line 43. And I found this function in opencv/modules/dnn/include/opencv2/dnn/dnn.hpp in line 620 between CV_EXPORTS_W Importer and createCaffeImporter (I'm sorry I don't know how to make a link to that line). But in the files I cloned, the same dnn.hpp doesn't have this function and createCaffeImporter is just behind CV_EXPORTS_W Importer.

I can download dnn.hpp individually to update the file, but it may confuse some people. Could you try to figure it out? Thanks.

AlexeyAB commented 6 years ago

@wsyzzz Just use original OpenCV, I pulled Yolo v2 directly into OpenCV since 3.4.0:


If you want to use my repo, just switch to the branch dnn_darknet_yolo_v2: https://github.com/AlexeyAB/opencv/tree/dnn_darknet_yolo_v2 The rules of the contributions are such that all pullrequests must be done from additional brunches.

fvlntn commented 6 years ago

@AlexeyAB

If forward is y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + 0.01)

https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp says for 2 inputs x1 / x2: (this is how it is implemented in Caffe and symmetrical for x1 and x2)

sortbackward

TaihuLight commented 6 years ago

@ralek67 @AlexeyAB What does negativeReLUSlope mean in your format? What does top_diff denote in https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp?

fvlntn commented 6 years ago

If ReLU is leaky: y = x if x > 0 0.01x if x < 0 then negativeReLUSlope = 0.01

In Tiffany SORT Layer it is = 0

Forward : y = x + ??? Back: dx = dy + ???

top_diff = dy it's delta for backpropagation bottom_diff[i] = top_diff[i] (1.0+bottom_gradient_data[i](bottom_data[i] > 0));

In my formula it is the inverse: it should be dx1/dy = 1 + ... Then if you multiply by dy you have the formula as in Tiffany: dx1 = dy ( 1 + Gradient (x1 > 0)) so by identification: dx1 = bottom_diff dx2 = bottom_diff_1 dy = top_diff

Hope you understand

TaihuLight commented 6 years ago

@ralek67 Could you share the process of getting your formula of the gradient? https://stackoverflow.com/questions/44512126/how-to-calculate-gradients-in-resnet-architecture

For forward, if y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + 0.001 and ReLU is leaky, SORT_short_cut can be implemented by replacing out[out_index] += add[add_index]; with

float sqrt_shift=0.001;
out[out_index] = out[out_index] + add[add_index] + sqrtf(max(sqrt_shift,out[out_index] * add[add_index]+sqrt_shift));

Is it correct? the understand of the following code is correct?


@ralek67 @AlexeyAB @wsyzzz 
fvlntn commented 6 years ago

I don't know how to add it in Darknet tbh. But just reading your forward pass is wrong, you wrote: out = out + add + sqrt (max(sqrtshift, outadd+sqrtshift)) (its obviously wrong since outadd is clearly positive after ReLU so max(sqrtshift, somethingpositive+sqrtshift means its always equals to somethingpositive+sqrtshift so it could be simplified to sqrt(out*add+sqrtshift) except if leaky)

and it should be: out = out + add + sqrt (max(0,out)*max(0,add) + sqrtshift) in blas.c

But according to the author of paper, it was implemented like this: https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp

Forward : image

If you just read Backward_cpu function for instance, you get to: image image

And that's exactly what math says if ReLU isn't leaky: image

Thing is you keep talking about y = x + F(x) instead of 2 inputs y = x1 + x2 If that's what you want: If y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + sqrtshift) then screenshot_20180419-101049

But that's just 16 years old student maths.

TaihuLight commented 6 years ago

(1) So, the forward pass *y = x + F(x) + sqrt( xF(x) + sqrtshift )** , sqrtshift = 0.001and ReLU is linear for the shorcut layer in YOLOv3, If out denotes the input value x=state.net.layers[l.index].output, and add denotes F(x)=state.input which is the output value of the last layer, then SORT_short_cut can be implemented with

float sqrt_shift=0.001;
out[out_index] = out[out_index] + add[add_index] + sqrt(max(0.0,out[out_index]) * max(0.0,add[add_index])+sqrt_shift);

in the funciton shortcut_cpu() of blas.c

@ralek67 Thank you very much! Although I have spent three days to understand functions of the shortcut layer, I still do not kown how to implemente the backward of ! (2)Then,the gradient in the backward sort_shortcut layer is calculated as follow,is it corret?
Besides, how to modify the backward for shortcut layer if the gradient is correct? I need your help sincerely. qq 20180420165256

(3) Meanwhile, the forward [shortcut] is y=x + F(x) in YOLOv3, then how to understand the code of the backward i.e, backward_shortcut_layer()?

In the above diagram, E denotes error for output neuron, the gradient delta_x is the sum of the incoming gradient delta_y and the product of the gradients delta_y and delta_F.

(4)Then how to understand axpy_cpu() and shortcut_cpu() in backward of original shortcut layer according to the processe of gradient computation shown in the above diagram? @AlexeyAB @ralek67

void backward_shortcut_layer(const layer l, network_state state)  
{
    gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta); 
    axpy_cpu(l.outputs*l.batch, 1, l.delta, 1, state.delta, 1);  
    shortcut_cpu(l.batch, l.out_w, l.out_h, l.out_c, l.delta, l.w, l.h, l.c, state.net.layers[l.index].delta); //dx=dx+dy
}