JiahuiYu / wdsr_ntire2018

Code of our winning entry to NTIRE super-resolution challenge, CVPR 2018
http://www.vision.ee.ethz.ch/ntire18/
598 stars 123 forks source link

Some conceptual questions #37

Closed jiequanz closed 5 years ago

jiequanz commented 5 years ago

Hi Jiahui, thanks for your work and I have some conceptual questions (maybe very basic).

In section 3.1 of the paper, in the vanilla residual network, why is the number of parameter 2 W1^2 k^2? I don't get where W1^2 comes from. Why is the final HR image representation S^2 * 3?

In the other issue (wdsr_b increase channel problem), I noticed this "48->48x6(1x1)->48x0.8->48" and "32->192(1x1)->25(1x1)->32". Do you mind explaining this? How do you determine how large the third number is (48*0.8 and 25)?

In section 3.2, what does low-rank convolution mean here? The paper says it factorizes a large convolution kernels to two low-rank convolution kernels. I think the two low-rank kernels are the 1x1 and 3x3 kernel, but what is the "large convolution kernel" refering to?

Is there a reason why these scale numbers (2-4,6-9) are chosen? Have you tried what happens if r is very large, such as 20? Also, what makes you come up with this 1x1 convolution to solve this?

How do you determine the number of residual box?

Thanks for reading! Sorry if some of them are too basic. I am really interested in the paper, but I am stuck by these questions.

JiahuiYu commented 5 years ago

@Jiequannnnnnnnnn

  1. There are two convolutions in each block, with each convolution having w1^2 * k^2.
  2. The output has scalescale 3(channels).
  3. We pick the number of channels such that wdsr_a and wdsr_b has roughly same params/FLOPs.
  4. 1x1 and 3x3.
  5. You can make it 20, but in that case under same params/flops, the identity pathway is too slim.
  6. The number of residual blocks are chosen following small EDSR.
jiequanz commented 5 years ago

Hi Jiahui,

Thanks for your reply! I have some follow up questions.

For my first question, I understand that there are two convolutions in one block, and k is the kernel size. What confused me was w1^2. Here is my understanding: For one convolution, we have w1 filters, each of size k*k. So the number of parameters is w1 k k(weights) + 1 (bias) .

What does FLOP stand for?

JiahuiYu commented 5 years ago

FLOPs is the number of multi-adds.