Implemented as in the article?

First of all thank you for implementing the v2 of this paper and maintaining. warning - I am mainly a keras/tf user If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.

From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be : 1) find offsets 3) fetch the feature space per filter pixel ( should be [batch_size x height x width x features x filters size] 4) multiply each feature by the relevant weight
This way two nearby pixels in the latent space can overlap if they wanted.

Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong. Thanks, Dan

4uiiurz1 / pytorch-deform-conv-v2

Implemented as in the article? #6