Why the mid result of convolution from C*h1*w1 is C*1*1 instead of d*1*1?

dongdong93 / a2u_matting

[CVPR2021]Learning Affinity-Aware Upsampling for Deep Image Matting

MIT License

37 stars 4 forks source link

Hello, recently i have read your paper A2U and been amazed, but i have a problem while figuring out the process of upsampling kernel generation. In Eq.3, we can learn that every value of upsampling kernel is generated from Hadamard product of projected UTx and VTy, since UT_k and x_k shape are dN and N1 respectively, the final equation can be considered as two d-dims vectors' inner product, which i believe is shown in Fig.2 with blue and green vector. But I'm confused why in Fig.2 the two vector is C11 instead of d11 ? Since U is a [d, C, h1, w1] convolution kernel, and input feature map is [C, h1, w1], the output of convolution should be d11 in by understanding. I'm wondering if my understanding above is correct, if it is wrong, how should i understand the process in right way. I know there may be some difficulty to question about a two-year-ago method, but i still hope to receive your answer. Thanks.

dongdong93 / a2u_matting

Why the mid result of convolution from C*h1*w1 is C*1*1 instead of d*1*1? #2

Why the mid result of convolution from Ch1w1 is C11 instead of d11? #2