Closed elda27 closed 7 years ago
There isn't a deconvolution layer. Someone will need to contribute one :)
Thank you for a reply. I want to contribute.
I think deconvolution would be better to extend the con_ class. If it is not my mistake, I think that deconvolution is convolution which has margin. What do you think of this idea
Deconvolution isn't implementable in terms of con_. Although it's obviously similar.
In any case, you are certainly welcome to contribute. Any new layer needs both a cuda and cpu implementation. So you would need to write both.
I got it. But I have a question. Will I write where are tests of deconvolution after I implement it?
Yes, write tests. It is important to test code.
Oh, sorry. I make you confuse. I'm asking "Which source file should I write a test"?
Thank you for the reply. I got it.
Any update on this @elda27? I am also interested in this functionality.
I implemented deconvolution and its test. But the test occurs a failure because of calculation results mismatches between CPU implementation and GPU implementation. Specificaly, the failure of test occurs when to calculate the get_gradient_for_filters (succeeded to calculate operator() and get_gradient_for_data).
In now, I'm fixing that bug. But I'm sorry to be not able to fix it soon because I'm a busy season.
If you can implement this, it maybe earlier than I finish to fix the bug.
Hmm I am bit surprised by that's because deconvolution layer as it described in a "Adaptive deconvolutional networks for mid and high level feature learning" by Zeller et al. Basically is a reversion of the forward and backward passes of a convolution layer (and Caffe e.g. certainly implemented it this way) so not sure how your implementation of CPU and GPU can differ.
Or am I missing something. At least that's how I was thinking of implementing it.
Hmm okay I had a look into this, but I am having some issues and I am not sure how to resolve them.
The problem I am having is particular the backpass.
If you consider a transposed convolution to be reverse passes of convolution layer as most seem to to do:
That means the forward pass in deconvolution layer is actually the backward gradient pass of convolution layer and the backward pass is the forward convolution layer pass.
If I do that, I'll get a mismatch in the tensor shape.
For example assume the following tensor with the size (n=4,c=2,r=3,col=4) and a filter tensor of the size (3,2,2,2) and using a x/y-stride of 2 I suspect you'll should get a an output tensor of (4,3,6,8) so far so good, but the problem lies now in the backward pass.
If you apply the convolution with the same filter, you will get a tensor with the size (4,3,3,4) instead of the expected (4,2,3,4) which is in the output shape of the previous layer.
Okay I managed to solve this issue now. I originally thought I could shoehorn transposed convolution by using the existing tensor_tool::conv classes but somehow that didn't quiet work so I added a tensor_tool:convt class instead and orientated the CUDA implementation on MatConvNet's implementation using manly CuDNN functions.
The trick I realized is that on (de)convTransposed the num filters in the filter tensor switches position it is not the usual the num_samples = num_filters but actually k = num_filters.
However I now seem to run into the same issue as @elda27 the individual test of my tensor_tool:convt class work and the CPU/GPU implementation have the similar result inside the error margin for both forward and backward passes.
However, when I add a new convt layer use my new tensor_tool:convt class and add it to the test_layers in dnn.cpp the test suite always fails with an gradient error and not by a small margin.
Hmmm...
Yep, that's how it goes. One small typo and the gradient is wrong. I know this pain :)
Well it turns out that actually my filter gradient backpass was wrong I only realized it when I used explicitly the CPU version and then it came up with an assert. I had the input parameters the wrong way round.
While I had a look at it all again I noticed that I can actually shoehorn it into the existing infrastructure using tensor_tool:conv classes without writing my own by simply moving a few functions around and adding a few helper functions. Which feels a lot better as it reduces basically code duplicity.
However, after doing that and comparing it to my fixed version of my own tensor_tool:convt class they both get the same result now. However the test_layer test still fails with a much much smaller error but still fails.
So I am a bit at a lost now. As I can't see why it should. Especially in the later case as it only uses already existing tested routines.
Well it depends on what you did specifically. It's certainly possible to build incorrect software out of calls to correct software routines.
But maybe the test_layer is just being too strict with the comparison to the numerical approximate derivative check it does?
That is of course obviously true :). But logical it seems all to be sound and it follows pretty much what others are doing so I can't really spot an error. But then I just started to dig deeper into dlib's internal structures :)
I am happy to send the changes I did if you like. It's not that much just five files I changed (not including the test)
I also can have another dig into test_layer code to maybe understand where the difference is coming from but I think that probably going to take me a while to figure out.
Would be interesting to compare to @elda27 too as by the sound of it he ran into a similar issue.
Sound good. Submit it as a PR and I'll take a look.
So I started to dig into the test_layer code and here is the interesting thing.
I am not quiet sure as I not fully understand the reason why certain things are done, but I looked at the data derivatives part of the test and the values between reference_derivs using central difference and the actual output_derivs from the backward pass are nearly identical until the "initial gradients" are subtracted from the actual output_derivs than the error increases and the test fails through. I am not sure why that step is done?
Read the layer interface, this is explained there. It comes down to some interface functions being required to add rather than assign to the output gradient. Which you have to do depends on what interface you decide to implement. test_layer() checks that you did it right.
Thanks I did so if I get this correctly when it is an inplace layer it will not be accumulative at least in the test_layer function but it can't be because input = output have not the same dimension. So that's the issue then and if I accumulate it passes the test, thanks :)
Thanks to @OranjeeGeneral, there is no a deconvolution/transpose layer in dlib (from this PR https://github.com/davisking/dlib/pull/476)
I want to compute deconvolution in dnn. But I coudn't find it in "layers.h".
For example, How do I compute shown as below image (The blue panels are sources and the green panels are destinations.)?
Can I use the deconvolution layer?