4uiiurz1 / pytorch-deform-conv-v2

PyTorch implementation of Deformable ConvNets v2 (Modulated Deformable Convolution)
MIT License
743 stars 141 forks source link

Implemented as in the article? #6

Open danFromTelAviv opened 5 years ago

danFromTelAviv commented 5 years ago

First of all thank you for implementing the v2 of this paper and maintaining. warning - I am mainly a keras/tf user If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.

From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be : 1) find offsets 3) fetch the feature space per filter pixel ( should be [batch_size x height x width x features x filters size] 4) multiply each feature by the relevant weight
This way two nearby pixels in the latent space can overlap if they wanted.

Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong. Thanks, Dan

LWJ312 commented 4 years ago

Hi Dan, I totally agree with your thoughts and three steps above. And the implement code is same as your idea that the offsets should be unique for each conv filter pixel. I'd like to remind you that in the code, after reshape the size, the x_offset ' shape is [ b, c, hkernel_size, w kernel_size ]. And finally with a conv layer (stride is same as kernel_size), the output can keep the same shape as the input x which is [ b, c, h, w]. :)