WolframRhodium / Super-Resolution-Zoo

Collection of pre-trained super-resolution models in MXNet.
130 stars 17 forks source link

Issue with Strides in RED30 #2

Closed ved27 closed 6 years ago

ved27 commented 6 years ago

Hi ,

In this repository ,

https://github.com/WolframRhodium/Super-Resolution-Zoo/blob/master/RED30/super_resolution/RED30_4x-symbol.json

has stride in all convolution / deconvolution layers as "stride": "(1,1)" Can you please let me know if it is an error and suggest the right strides for the layers

Awaiting a speedy response,

WolframRhodium commented 6 years ago

Hi,

what's the problem with stride in your opinion? Is it the value is wrong or not compatiable with the paper?

ved27 commented 6 years ago

I think it is an error

Because , super-resolution architectures have atleast stride >= 2 for deconvolution And all the strides are 1 , which means that the input image is not being upscaled at all.

Kindly check the appropriate stride values if I'm not wrong.

Also, can you please let me know how were these .json files generated / written ?

WolframRhodium commented 6 years ago

According to the paper, the size of output of RED30 is the same as its input, which is described in the specification of training set:

For super-resolution, we first down-sample a patch and then up-sample it to its original size, obtaining a low-resolution version as the input of the network.

Thus the receptive field of each layers in RED30 is the same, and stride is 1.

To super-resolve an image using RED30, one has to first upscale the image using bicubic upscaling algorithm (default setting of imresize in MATLAB), then feed it to the network to obtain the upscaled clean image.

.json files of RED30 are obtained by converting the official model from Caffe to MXNet using MXNet's caffe_converter with slight modification.

ved27 commented 6 years ago

Hi,

Many thanks for the above info.

In general if the stride is 1 , deconvolution layers can be replaced with convolution instead. As deconvolution is mostly used to learn the upscaling part . So , this was my doubt since all the strides used are of type 1 .

I took into consideration the following line and assumed that the base implementation could have stride >= 2 on at least one convolution in encoding part _In specific, we use stride = 2 in convolutional layers to down-sample the feature maps. Down-sampling at different convolutional layers are tested**_

Please let me know if I'm right in the following few points: 1) The actual base model implemented from the paper has all strides = 1 (convolution and deconvolution as well) 2) We can change the stride values from encoding and corresponding decoding parts of the network to compromise between PSNR and speed of forward pass in network 3) RED30 network is exactly the same for deblurring / denoising / super-resolution apart from the input pre-processing

Hoping a quick confirmation of each the above 3 points from you,

Thank you much again for the info you provided in the earlier reply , this repository is really useful for us

WolframRhodium commented 6 years ago

The 3 points are true in general: (1) and (2) is completely true.

For (2), using strides=2 makes the architecture of the network seem more close to U-Net, IMHO. (Then only the configuration of number of channels is not identical)

(3) is true mostly. The number of input and output channels of denoising / deblocking / super-resolution is 1, while it is 3 for debluring / inpainting. The other parts are exactly the same.

Anyway, I have heard people wonder if the authors have misunderstood deconvolution operator and made strange network design due to its name.

ved27 commented 6 years ago

Hi ,

Many thanks for clarifying my queries. I shall deploy the models . Hoping to discuss more and more fundaments.

ved27 commented 6 years ago

Hi,

As the number of input channels are varying for each model,

Can you please let me know why it is different in different cases(denoising , SR, deblocking, deblurring , inpainting ) here ?

For example : SRCNN/FSRCNN trains the netwrok for individual channels Fast style transfer : training is done for all input 3 channels

Thank you,

WolframRhodium commented 6 years ago

Hi,

I'm not entirely sure, but I think the reason is that the importance of chrominance varies in different tasks.

It seems in some tasks like style transfer and inpainting, it is necessary to utilize both luminance and chrominance information, while in other tasks like denoising, super-resolution, deblocking, chrominance information is not much essential, and therefore you might notice different algorithms are using different number of channels for input at these tasks.

ved27 commented 6 years ago

Hi,

Yea, it seems the same as dependency on RGB / YCbCr varies depending on the application. More experiments to be done

Thank you very much for the help