Open wsyzzz opened 6 years ago
Route-layer is the same as concat-lyaer in the Caffe. (When route use only one input - then route-layer is the same as identity-layer in the Caffe)
More: https://github.com/AlexeyAB/darknet/issues/120#issuecomment-313371171
In layer-25, you mean we take the result of layer-16 as output of layer-25 as well as input of layer-26. But how do we deal with the output of layer-24? If we drop it, layer-17-24 will be meaningless. @AlexeyAB
Layer_27
will concatenate layer_24 + layer_26
.
So the route-layer and reorg-layer are a module to concatenate current output (like layer-24's output) and previous output(like layer-16's output), which means 'bring finer grained features in from earlier in the network'. And their functions are just like what their names fit. The route-layer is like a route sign and pointing to the layer we want to concatenate. The reorg-layer is actually 'reorganization' layer. Thank you, Alexey! You are the most patient author I've ever met!
Here's another question I'm wondering. My object detection is slow. I use yolo-voc.cfg network 416x416 on your fork and get 9.2s/ one picture on average. And my CPU is eight i7-7700K Core with 39.76 GFlops/computer[1]. Due to some restrictions, I cannot use GPU. According to this issue #80, I should achieve about ~0.01 FPS per 1 GFlops-SP. So I should have gotten 0.3976 FPS and ~2.5s/ one picture. Could you figure out what's the problem? Thanks a lot! [1]CPU performance
~0.01 FPS per 1 GFlops-SP - only if all CPU-resources are used.
But Darknet Yolo well optimized only for GPU, but not for CPU. I.e. Darknet doesn't use SSE3/4/AVX (SIMD) optimizations, so it slower about 3-4x times than it could be.
Did you compile with OPENMP=1
in the Makefile? It will use multi-threads for CPU.
I added Yolo v2 to the OpenCV: https://github.com/opencv/opencv/pull/9705
So if you want to use Yolo on CPU then the fastest way is to use Yolo v2 that built-in OpenCV since 3.4.0 - it can process ~2.5s/ one picture and faster:
use this example of Yolo: https://github.com/AlexeyAB/opencv/blob/ecc34dc5219bf70cf9ede89cf7bac8f895938da1/samples/dnn/yolo_object_detection.cpp
or use this example (this file contains example of detection using SSD/Yolo/Faster-RCNN with weights/network-files from Darknet/Caffe/Tensorflow/Torch): https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
Thanks for your advice! And I tried your first solution -- set OPENMP=1
. Then I get results in a flash, but it shows it takes 6.2s. But it is much faster than which one shows that it takes 3.2s by my feeling... Does any trouble with the timer?
Yes, clock_t
shows CPU-time instead of real steady time: https://stackoverflow.com/a/10874375/1558037
So it actually works much faster.
Get it! You really help me a lot! I just need to use an another timer function!
@AlexeyAB Do you mean you build two versions of Yolo v2? One is built without OpenCV, the other one is built in OpenCV 3.4, and the latter one is faster? Can the latter one support OpenCV 3.1? Thanks~
I mean that OpenCV 3.4.0 already contains Yolo v2 for CPU inside OpenCV. So you can just install OpenCV 3.4.0 without installation Darknet - and you can use the fastest version of Yolo v2 for CPU.
Can the latter one support OpenCV 3.1?
Yolo v2 that built-in OpenCV only since 3.4.0.
But this repository you can use with any OpenCV version.
@wsyzzz @AlexeyAB What is the difference in the function and workflow for residual connection between shortcut layer and route layer?
@TaihuLight
[route]
-layer concatenate the values:
1, 2, 3
4, 5, 6
1, 2, 3, 4, 5, 6
[shortcut]
-layer adds (+
) the values:
1, 2, 3
4, 5, 6
5, 7, 9
@AlexeyAB Thank you, I think SORT-layer https://github.com/AlexeyAB/darknet/issues/473 can be implemented by changing the functions in [shorcut-]layers as following:
[SORT-shortcut]-layer adds (+) & multiple() the values: 1st input: 1, 2, 3 2nd input: 4, 5, 6 output: 5+sqrt(1 4), 7+sqrt(2 5), 9+sqrt(3 6)
@TaihuLight
If [shortcut]
should be calculated
y = x + F(x)
- forwarddelta_x = delta_y + delta_F(x)
- delta for back-propagationAnd [SORT-shortcut]
should be calculated as: https://arxiv.org/pdf/1703.06993.pdf
y = x + F(x) + sqrt( ReLU(x) * ReLU(F(x)) + 0.0001 )
- forward@AlexeyAB https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp This is the code of SORT codes implemented in caffe by the author of this paper, I am learning it, and hope that it can help us.
@AlexeyAB
Here is a small question while I was using the example of Yolo. I used https://github.com/AlexeyAB/opencv.git
and running the yolo_object_detection.cpp. It returned error: 'readNetFromDarknet' was not declared in this scope
in line 43. And I found this function in opencv/modules/dnn/include/opencv2/dnn/dnn.hpp in line 620 between CV_EXPORTS_W Importer
and createCaffeImporter
(I'm sorry I don't know how to make a link to that line). But in the files I cloned, the same dnn.hpp doesn't have this function and createCaffeImporter
is just behind CV_EXPORTS_W Importer
.
I can download dnn.hpp individually to update the file, but it may confuse some people. Could you try to figure it out? Thanks.
@wsyzzz Just use original OpenCV, I pulled Yolo v2 directly into OpenCV since 3.4.0:
If you want to use my repo, just switch to the branch dnn_darknet_yolo_v2
: https://github.com/AlexeyAB/opencv/tree/dnn_darknet_yolo_v2
The rules of the contributions are such that all pullrequests must be done from additional brunches.
@AlexeyAB
If forward is y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + 0.01)
https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp says for 2 inputs x1 / x2: (this is how it is implemented in Caffe and symmetrical for x1 and x2)
@ralek67 @AlexeyAB What does negativeReLUSlope mean in your format? What does top_diff denote in https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp?
If ReLU is leaky: y = x if x > 0 0.01x if x < 0 then negativeReLUSlope = 0.01
In Tiffany SORT Layer it is = 0
Forward : y = x + ??? Back: dx = dy + ???
top_diff = dy it's delta for backpropagation bottom_diff[i] = top_diff[i] (1.0+bottom_gradient_data[i](bottom_data[i] > 0));
In my formula it is the inverse: it should be dx1/dy = 1 + ... Then if you multiply by dy you have the formula as in Tiffany: dx1 = dy ( 1 + Gradient (x1 > 0)) so by identification: dx1 = bottom_diff dx2 = bottom_diff_1 dy = top_diff
Hope you understand
@ralek67 Could you share the process of getting your formula of the gradient? https://stackoverflow.com/questions/44512126/how-to-calculate-gradients-in-resnet-architecture
For forward, if y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + 0.001 and ReLU is leaky, SORT_short_cut can be implemented by replacing
out[out_index] += add[add_index];
with
float sqrt_shift=0.001;
out[out_index] = out[out_index] + add[add_index] + sqrtf(max(sqrt_shift,out[out_index] * add[add_index]+sqrt_shift));
Is it correct? the understand of the following code is correct?
@ralek67 @AlexeyAB @wsyzzz
I don't know how to add it in Darknet tbh. But just reading your forward pass is wrong, you wrote: out = out + add + sqrt (max(sqrtshift, outadd+sqrtshift)) (its obviously wrong since outadd is clearly positive after ReLU so max(sqrtshift, somethingpositive+sqrtshift means its always equals to somethingpositive+sqrtshift so it could be simplified to sqrt(out*add+sqrtshift) except if leaky)
and it should be: out = out + add + sqrt (max(0,out)*max(0,add) + sqrtshift) in blas.c
But according to the author of paper, it was implemented like this: https://github.com/tiffany0107/SORT-Layer/blob/master/sort_layer.cpp
Forward :
If you just read Backward_cpu function for instance, you get to:
And that's exactly what math says if ReLU isn't leaky:
Thing is you keep talking about y = x + F(x) instead of 2 inputs y = x1 + x2 If that's what you want: If y = x + F(x) + sqrt(ReLU(x)*ReLU(F(x)) + sqrtshift) then
But that's just 16 years old student maths.
(1) So, the forward pass *y = x + F(x) + sqrt( xF(x) + sqrtshift )** , sqrtshift = 0.001and ReLU is linear for the shorcut layer in YOLOv3, If out denotes the input value x=state.net.layers[l.index].output, and add denotes F(x)=state.input which is the output value of the last layer, then SORT_short_cut can be implemented with
float sqrt_shift=0.001;
out[out_index] = out[out_index] + add[add_index] + sqrt(max(0.0,out[out_index]) * max(0.0,add[add_index])+sqrt_shift);
in the funciton shortcut_cpu() of blas.c
@ralek67 Thank you very much! Although I have spent three days to understand functions of the shortcut layer, I still do not kown how to implemente the backward of !
(2)Then,the gradient in the backward sort_shortcut layer is calculated as follow,is it corret?
Besides, how to modify the backward for shortcut layer if the gradient is correct? I need your help sincerely.
(3) Meanwhile, the forward [shortcut] is y=x + F(x) in YOLOv3, then how to understand the code of the backward i.e, backward_shortcut_layer()?
In the above diagram, E denotes error for output neuron, the gradient delta_x is the sum of the incoming gradient delta_y and the product of the gradients delta_y and delta_F.
(4)Then how to understand axpy_cpu() and shortcut_cpu() in backward of original shortcut layer according to the processe of gradient computation shown in the above diagram? @AlexeyAB @ralek67
void backward_shortcut_layer(const layer l, network_state state)
{
gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta);
axpy_cpu(l.outputs*l.batch, 1, l.delta, 1, state.delta, 1);
shortcut_cpu(l.batch, l.out_w, l.out_h, l.out_c, l.delta, l.w, l.h, l.c, state.net.layers[l.index].delta); //dx=dx+dy
}
Hi everyone, does anyone know how 'route' layer work in yolov2? I google it and only find that "the route layer is to bring finer grained features in from earlier in the network". So how it 'bring finer grained features in'?
E.g. I provide a case to explain. I use
./darknet detector test cfg/coco.data cfg/yolo.cfg yolo.weights data/dog.jpg
And in 'yolo.cfg', I found that
So It's clear that the 25th route layer uses 16th layer for 'layers=-9' in .cfg file, the 28th route layer uses 27th and 24th layer for 'layers=-1, -3' in .cfg file.
Here is my question. How the route layer how the 25th route layer uses 13×13×1024 input from the 24th layer and 26×26×512 input from the 16th layer to obtain 26×26×512 output(according to the 26th input)?
I am looking for any advice. Thanks for your response!