davisking / dlib

A toolkit for making real world machine learning and data analysis applications in C++
http://dlib.net
Boost Software License 1.0
13.5k stars 3.37k forks source link

Where is deconvolution (transposed convolution) layer in dnn. #372

Closed elda27 closed 7 years ago

elda27 commented 7 years ago

I want to compute deconvolution in dnn. But I coudn't find it in "layers.h".

For example, How do I compute shown as below image (The blue panels are sources and the green panels are destinations.)?

f2rip

Can I use the deconvolution layer?

davisking commented 7 years ago

There isn't a deconvolution layer. Someone will need to contribute one :)

elda27 commented 7 years ago

Thank you for a reply. I want to contribute.

I think deconvolution would be better to extend the con_ class. If it is not my mistake, I think that deconvolution is convolution which has margin. What do you think of this idea

davisking commented 7 years ago

Deconvolution isn't implementable in terms of con_. Although it's obviously similar.

In any case, you are certainly welcome to contribute. Any new layer needs both a cuda and cpu implementation. So you would need to write both.

elda27 commented 7 years ago

I got it. But I have a question. Will I write where are tests of deconvolution after I implement it?

davisking commented 7 years ago

Yes, write tests. It is important to test code.

elda27 commented 7 years ago

Oh, sorry. I make you confuse. I'm asking "Which source file should I write a test"?

davisking commented 7 years ago

See http://dlib.net/howto_contribute.html

elda27 commented 7 years ago

Thank you for the reply. I got it.

decrispell commented 7 years ago

Any update on this @elda27? I am also interested in this functionality.

elda27 commented 7 years ago

I implemented deconvolution and its test. But the test occurs a failure because of calculation results mismatches between CPU implementation and GPU implementation. Specificaly, the failure of test occurs when to calculate the get_gradient_for_filters (succeeded to calculate operator() and get_gradient_for_data).

In now, I'm fixing that bug. But I'm sorry to be not able to fix it soon because I'm a busy season.

If you can implement this, it maybe earlier than I finish to fix the bug.

OranjeeGeneral commented 7 years ago

Hmm I am bit surprised by that's because deconvolution layer as it described in a "Adaptive deconvolutional networks for mid and high level feature learning" by Zeller et al. Basically is a reversion of the forward and backward passes of a convolution layer (and Caffe e.g. certainly implemented it this way) so not sure how your implementation of CPU and GPU can differ.

Or am I missing something. At least that's how I was thinking of implementing it.

OranjeeGeneral commented 7 years ago

Hmm okay I had a look into this, but I am having some issues and I am not sure how to resolve them.

The problem I am having is particular the backpass.

If you consider a transposed convolution to be reverse passes of convolution layer as most seem to to do:

That means the forward pass in deconvolution layer is actually the backward gradient pass of convolution layer and the backward pass is the forward convolution layer pass.

If I do that, I'll get a mismatch in the tensor shape.

For example assume the following tensor with the size (n=4,c=2,r=3,col=4) and a filter tensor of the size (3,2,2,2) and using a x/y-stride of 2 I suspect you'll should get a an output tensor of (4,3,6,8) so far so good, but the problem lies now in the backward pass.

If you apply the convolution with the same filter, you will get a tensor with the size (4,3,3,4) instead of the expected (4,2,3,4) which is in the output shape of the previous layer.

OranjeeGeneral commented 7 years ago

Okay I managed to solve this issue now. I originally thought I could shoehorn transposed convolution by using the existing tensor_tool::conv classes but somehow that didn't quiet work so I added a tensor_tool:convt class instead and orientated the CUDA implementation on MatConvNet's implementation using manly CuDNN functions.

The trick I realized is that on (de)convTransposed the num filters in the filter tensor switches position it is not the usual the num_samples = num_filters but actually k = num_filters.

However I now seem to run into the same issue as @elda27 the individual test of my tensor_tool:convt class work and the CPU/GPU implementation have the similar result inside the error margin for both forward and backward passes.

However, when I add a new convt layer use my new tensor_tool:convt class and add it to the test_layers in dnn.cpp the test suite always fails with an gradient error and not by a small margin.

Hmmm...

davisking commented 7 years ago

Yep, that's how it goes. One small typo and the gradient is wrong. I know this pain :)

OranjeeGeneral commented 7 years ago

Well it turns out that actually my filter gradient backpass was wrong I only realized it when I used explicitly the CPU version and then it came up with an assert. I had the input parameters the wrong way round.

While I had a look at it all again I noticed that I can actually shoehorn it into the existing infrastructure using tensor_tool:conv classes without writing my own by simply moving a few functions around and adding a few helper functions. Which feels a lot better as it reduces basically code duplicity.

However, after doing that and comparing it to my fixed version of my own tensor_tool:convt class they both get the same result now. However the test_layer test still fails with a much much smaller error but still fails.

So I am a bit at a lost now. As I can't see why it should. Especially in the later case as it only uses already existing tested routines.

davisking commented 7 years ago

Well it depends on what you did specifically. It's certainly possible to build incorrect software out of calls to correct software routines.

But maybe the test_layer is just being too strict with the comparison to the numerical approximate derivative check it does?

OranjeeGeneral commented 7 years ago

That is of course obviously true :). But logical it seems all to be sound and it follows pretty much what others are doing so I can't really spot an error. But then I just started to dig deeper into dlib's internal structures :)

I am happy to send the changes I did if you like. It's not that much just five files I changed (not including the test)

I also can have another dig into test_layer code to maybe understand where the difference is coming from but I think that probably going to take me a while to figure out.

Would be interesting to compare to @elda27 too as by the sound of it he ran into a similar issue.

davisking commented 7 years ago

Sound good. Submit it as a PR and I'll take a look.

OranjeeGeneral commented 7 years ago

So I started to dig into the test_layer code and here is the interesting thing.

I am not quiet sure as I not fully understand the reason why certain things are done, but I looked at the data derivatives part of the test and the values between reference_derivs using central difference and the actual output_derivs from the backward pass are nearly identical until the "initial gradients" are subtracted from the actual output_derivs than the error increases and the test fails through. I am not sure why that step is done?

davisking commented 7 years ago

Read the layer interface, this is explained there. It comes down to some interface functions being required to add rather than assign to the output gradient. Which you have to do depends on what interface you decide to implement. test_layer() checks that you did it right.

OranjeeGeneral commented 7 years ago

Thanks I did so if I get this correctly when it is an inplace layer it will not be accumulative at least in the test_layer function but it can't be because input = output have not the same dimension. So that's the issue then and if I accumulate it passes the test, thanks :)

davisking commented 7 years ago

Thanks to @OranjeeGeneral, there is no a deconvolution/transpose layer in dlib (from this PR https://github.com/davisking/dlib/pull/476)