facebookresearch / silk

SiLK (Simple Learned Keypoint) is a self-supervised deep learning keypoint model.
GNU General Public License v3.0
644 stars 58 forks source link

Is there position shift when applying padding=0? #60

Closed ZhonghuaYi closed 10 months ago

ZhonghuaYi commented 10 months ago

I have read the origin paper of SiLK, which declared that it's important to use no padding when detect keypoints.

If the padding=1 is applied, the raised false positive corner pattern may affect the training, as mentioned in the SiLK paper. And I saw the scripts in scripts/examples use padding=0 for all sub-modules to avoid this, just like the paper said.

But while applying padding=0 in SiLKVGG network, the size of the probability map has shrinked. If the backbone and the detector head all have no padding, the upper left corner should shift to lower right by about 9 pixels. So as a result, the upper left corner of the probability map does not represent the real upper left point of the original image. The post-processing is therefore needed, by applying additional zero padding into the probability map, or adding relevant shifts to the points positions (add 9 if the backbone and the head have no padding).

But I haven't seen any relevant post-processing in the code. In fact, when i set padding equals 0 or 1, the size of the dense_output shows different. The positions all begin with [0.5, 0.5], but the last position is different. When I input an image with the size of 375x1242, the outputs are the below: padding=0:

probability shape:  torch.Size([1, 1, 357, 1224])
dense_positions shape:  torch.Size([1, 436968, 3])
dense_positions begin:  tensor([0.5000, 0.5000, 0.0824], device='cuda:0')
dense_positions end:  tensor([3.5650e+02, 1.2235e+03, 1.8867e-02], device='cuda:0')

padding=1:

probability shape:  torch.Size([1, 1, 375, 1242])
dense_positions shape:  torch.Size([1, 465750, 3])
dense_positions begin:  tensor([0.5000, 0.5000, 0.0652], device='cuda:0')
dense_positions end:  tensor([3.7450e+02, 1.2415e+03, 7.9876e-02], device='cuda:0')

So I am confused that is there really needed for the post-processing for the potential position shift? Or I just misunderstand the whole problem, the position shift problem just doesn't exist?

gleize commented 10 months ago

Hi @ZhonghuaYi,

You are correct in mentioning there is indeed a shift of 9 when using padding=0. The position you obtained are in feature coordinate space. To convert them to image coordinate space, we added the from_feature_coords_to_image_coords function. This will essentially look at the structure of the model, figure out the linear coordinate mapping between feature space and image space, and then apply that linear transform to the positions you obtained.

For example :

Here and here are good pointers if you need more information about it.

ZhonghuaYi commented 10 months ago

Thank you @gleize

I carefully reviewed the code and found the from_feature_coords_to_image_coords function. It does the work you said. And the information in FAQ really helps.

I think this issue cloud be closed.