Open JJUN99 opened 1 month ago
@JJUN99 Thank you for pointing out the good point! I conjecture the reason to highlight that this is a 'deconvolutional' operation to original convolution. Given the convTranspose2D operation ConvT with a given stride, padding, and kernel size, we assume input I and output O, where O = ConvT(I). Then, It would be interesting to check that, by the standard convolution Conv with sane stride, padding, and kernel size with ConvT, the relationship I = Conv(O) holds or not. If yes, my conjecture is true, or... I should find another conjecture :) .
@JJUN99
Let's say we perform the convolution operation Conv2d(kernel_size=k, stride=s, padding=p, ...)
with an input resolution of $I$,
and we obtain the output resolution of $O = \frac{I + 2p - k}{s} + 1$.
Now, if we apply the conv transpose operation with the specific set of parameters: $z=s-1$, $p'=k-p-1$, $s'=1$, we can calculate the new resolution $\hat{I}$ like the following:
$$\begin{split}\hat{I} &= \frac{O + z(O-1) + 2p' - k}{s'} + 1 & \text{(}z(O - 1) \text{ meaning inserted zeros between the input)}\ &= O + z(O-1) + 2p' - k + 1\ &= O + (s-1)(O-1) + 2(k-p-1) - k + 1\ &= O + s(O-1) - O + 1 + 2k - 2p - 2 - k + 1\ &= s(O-1) + k - 2p\ &= s\left(\frac{I + 2p - k}{s} + 1 - 1\right) + k - 2p\ &= I + 2p - k + k - 2p\ &= I \end{split}$$
As a result of conv transpose (w/ $z=s-1$, $p'=k-p-1$, $s'=1$), we have successfully reversed the dimension change from the original convolution, achieving the "deconvolutional" effect.
I’m confused about the meaning of padding (p) and stride (s) in ConvTranspose2D. I understand that the padding between pixels is made of z = s - 1, and the padding outside the image is p’ = k - p - 1. However, I don’t quite understand why we need to transform padding and stride into these new operations. Wouldn’t it be more intuitive to directly define and use z and p’ from the beginning? Or, perhaps, the meaning behind this transformation is important. So, what is the real significance of this transformation?