PiLab-CAU / ImageProcessing-2402

Image processing repo
MIT License
1 stars 1 forks source link

[Lecture2-2][1017] Notion of padding and stride in ConvTranspose2D #23

Open JJUN99 opened 1 month ago

JJUN99 commented 1 month ago

2_2_Autoencoder-38

I’m confused about the meaning of padding (p) and stride (s) in ConvTranspose2D. I understand that the padding between pixels is made of z = s - 1, and the padding outside the image is p’ = k - p - 1. However, I don’t quite understand why we need to transform padding and stride into these new operations. Wouldn’t it be more intuitive to directly define and use z and p’ from the beginning? Or, perhaps, the meaning behind this transformation is important. So, what is the real significance of this transformation?

yjyoo3312 commented 4 weeks ago

@JJUN99 Thank you for pointing out the good point! I conjecture the reason to highlight that this is a 'deconvolutional' operation to original convolution. Given the convTranspose2D operation ConvT with a given stride, padding, and kernel size, we assume input I and output O, where O = ConvT(I). Then, It would be interesting to check that, by the standard convolution Conv with sane stride, padding, and kernel size with ConvT, the relationship I = Conv(O) holds or not. If yes, my conjecture is true, or... I should find another conjecture :) .

jleem99 commented 3 weeks ago

@JJUN99 Let's say we perform the convolution operation Conv2d(kernel_size=k, stride=s, padding=p, ...) with an input resolution of $I$, and we obtain the output resolution of $O = \frac{I + 2p - k}{s} + 1$.

Now, if we apply the conv transpose operation with the specific set of parameters: $z=s-1$, $p'=k-p-1$, $s'=1$, we can calculate the new resolution $\hat{I}$ like the following:

$$\begin{split}\hat{I} &= \frac{O + z(O-1) + 2p' - k}{s'} + 1 & \text{(}z(O - 1) \text{ meaning inserted zeros between the input)}\ &= O + z(O-1) + 2p' - k + 1\ &= O + (s-1)(O-1) + 2(k-p-1) - k + 1\ &= O + s(O-1) - O + 1 + 2k - 2p - 2 - k + 1\ &= s(O-1) + k - 2p\ &= s\left(\frac{I + 2p - k}{s} + 1 - 1\right) + k - 2p\ &= I + 2p - k + k - 2p\ &= I \end{split}$$

As a result of conv transpose (w/ $z=s-1$, $p'=k-p-1$, $s'=1$), we have successfully reversed the dimension change from the original convolution, achieving the "deconvolutional" effect.