dongdong93 / a2u_matting

[CVPR2021]Learning Affinity-Aware Upsampling for Deep Image Matting
MIT License
37 stars 4 forks source link

Why the mid result of convolution from C*h1*w1 is C*1*1 instead of d*1*1? #2

Open RewindL opened 1 year ago

RewindL commented 1 year ago

Hello, recently i have read your paper A2U and been amazed, but i have a problem while figuring out the process of upsampling kernel generation. In Eq.3, we can learn that every value of upsampling kernel is generated from Hadamard product of projected UTx and VTy, since UT_k and x_k shape are dN and N1 respectively, the final equation can be considered as two d-dims vectors' inner product, which i believe is shown in Fig.2 with blue and green vector. But I'm confused why in Fig.2 the two vector is C11 instead of d11 ? Since U is a [d, C, h1, w1] convolution kernel, and input feature map is [C, h1, w1], the output of convolution should be d11 in by understanding. I'm wondering if my understanding above is correct, if it is wrong, how should i understand the process in right way. I know there may be some difficulty to question about a two-year-ago method, but i still hope to receive your answer. Thanks.

RewindL commented 1 year ago

I might figured out what happened. In Fig.2, extracted Cxh1xw1 feature map was sent to C-group-convolution to get Cx1x1 representation, and two Cx1x1 represetations were sent to calculate inner product to get a 111 value from d11 (d times C-group-convolution and inner product will be done and concatenated), then the d11 vector was sent to be convoluted by P and output sxs vector, which is a position of upsampling kernel. But in the expriments, d=1 was set so the Fig.2 do not display this process.