Closed ShuangLiu1992 closed 6 years ago
You can use a block like this:
template <int N, typename SUBNET>
using se = scale8<sig<fc<N,relu<bn_fc<fc<N/16,avg_pool_everything<tag8<SUBNET>>>>>>>>;
Just wrap it around relu layers like the paper suggests.
thank you very much davis, it looks quite different from this implementation based on dlib, is this implementation accurate? https://github.com/antikantian/facedet-squeezenet/blob/master/squeezenet_train.cpp
what about mobilenet, could it be defined by the existing loss layers in dlib? MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
ah it's Squeeze-and-Excitation Networks vs SQUEEZENET? These names are confusing...
Yeah. I added the scale layer for squeeze and excitation. I don't think you need anything special for sweezenet. I haven't read the modbilenets paper.
mobilenets is supposed to be a lot smaller and faster, but requires depthwise_convolution (https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d)
anyway to hack together such an operation with existing layers in dlib?
perhaps just need to add support for grouped convolutions? it now comes with cudnn 7, there is also shuffle net which seems to have very good performance in their paper.
btw I tried the scale_ layer but the training/test error was somewhat worse, maybe I wrapped it wrong?
template <int N, typename SUBNET> using res = se<relu<residual<block, N, bn_con, SUBNET>>>; template <int N, typename SUBNET> using ares = se<relu<residual<block, N, affine, SUBNET>>>; template <int N, typename SUBNET> using res_down = se<relu<residual_down<block, N, bn_con, SUBNET>>>; template <int N, typename SUBNET> using ares_down = se<relu<residual_down<block, N, affine, SUBNET>>>;
That's how I use SE layer too. Sometimes it's worse, sometimes it's better. You will find that when you try to replicate results from papers they often aren't as generally rosey as the authors suggest. That's not to say I haven't found SE layers to be useful in some cases. But certainly not all.
Yeah, more layers would be good. You should add it and submit a pull request :)
Been trying to add group convolution for the past few days but progress is slow. I think making the cudnn part work one just needs to change the size of the output tensor and pass the group parameter to cudnnConvolutionDescriptor_t correctly.
But in reality, it's hard to use tools like alias_tensor and make sure the memory layout on cpu is the same as the gpu, when the user doesn't have your level of understanding of dlib's design, sigh...
Group convolution seems to be a key technology for lightweight fast networks, it would be really great if dlib had it. A large portion of our project is written with dlib and we don't really want to switch to caffe or tensorflow just because one type of layer is missing, do you think there is any chance you would add support for it in the next dlib release?
Why do you need to change the size of the filters? Isn't this just a matter of calling cudnnSetConvolutionGroupCount() with the group count?
Also, you don't need to use alias_tesnor and all the memory layout stuff is documented, moreover, it's the same layout used by cuDNN.
thanks davis, I will read more into it and try again... I know you are probably busy with other things and this seems like a very simple but valuable thing to add support for, could you please look into it if you have time? I know you generally have a high standard for documentation and tests in a PR, so I feel like even if I had implemented this layer it will be useless if you decided to work on it in the next release.....
Yes, I'm very busy. I'm not going to do this right now. Maybe in a few months. It's like you say, adding things to dlib is a lot of work because I need to write all the unit tests and documentation as well as make a CPU implementation that matches the behavior of cuDNN and cuDNN is never that well documented. So that last step is a pain in the ass.
But in your case, since you don't care about doing all that stuff it's super easy. Just call that one cuDNN function and let it do what it does. It seems like it's literally just one additional function call you have to make to turn on group convolutions.
yea I understand that last step is a pain in the ass, and the reason why I don't just wanna just call that cuDNN function is because we are doing the inference on the cpu...seems like I'm out of luck
https://www.dropbox.com/s/aqr3lfryxl2jban/group_convolution.pdf?dl=0
I think by definition the size of the output tensor is different from standard conv2d (a tensorflow output is attached in the link)
Doesn't seem like it. Read this: https://blog.yani.io/filter-group-tutorial/
I think that might be after the reduction op achieved performed with the 1x1 pointwise convolution? (I might be wrong) https://stackoverflow.com/questions/44226932/difference-between-tf-nn-conv2d-and-tf-nn-depthwise-conv2d?rq=1 https://ikhlestov.github.io/pages/machine-learning/convolutions-types/#depthwise-separable-convolutions-separable-convolutions
ok I was wrong.
Now I have got the both CPU and GPU version working for channel % group == 0, but they would break if dilation was eventually added.
https://github.com/davisking/dlib/issues/1018 https://github.com/davisking/dlib/issues/288
Below is a link of an example of how dilation is handled, I could write the grouped convolution layer similarly to this if dilation is implemented in a similar way... https://github.com/Tencent/ncnn/blob/master/src/layer/convolutiondepthwise.cpp
What's your thoughts on how the dilation should be implemented in dlib?
Thank you Davis
Cool. You should submit a PR :)
I would implement dilation however cuDNN does it.
Not ready for a PR yet as I didn't bother to implement get_gradient_for_data/get_gradient_for_filters for the CPU... The inference speed compared to Intel MKL as the BLAS backend is not very encouraging, however on simple dataset it does seem to learn better features as the validation error is smaller than standard convolution nets with similar structure.
We do believe the inference speed might be better since we are targeting mobile, although on IOS you have the accelerate framework and on Android you have openblas...
Any news about it? May be some help can bring those earlier.
Hi Davis
In the latest release, with the addition of scale_ layer squeeze-and-excitation can now be implemented. Would it be possible for you to give an example or some hint of defining such networks? I could train with my GPU on benchmark datasets and report performance.