facebookresearch / silk

SiLK (Simple Learned Keypoint) is a self-supervised deep learning keypoint model.
GNU General Public License v3.0
644 stars 58 forks source link

Keypoints and descriptors sparsification #37

Closed haksorus closed 1 year ago

haksorus commented 1 year ago

Hello!

I have a question about manual implementation of keypoints and descriptors sparsification. For my case i have converted SiLK model which can only produce dense keypoints and dense descriptors (or raw). So I want to use this dense outputs ((1, H` x W`, 3), (1, H` x W`, 128)) to compute sparse keypoints and sparse descriptors.

How can i do it most effectively according to SiLK codebase?

gleize commented 1 year ago

Hi @haksorus,

I hope this helps.

haksorus commented 1 year ago

Thank you, @gleize!

Your answer was very helpful and my problem was solved. But also I would like to know if there is an easy way to convert output sparse positions and descriptors which size is still H' x W' to image coords which size is H x W?

gleize commented 1 year ago

Hi @haksorus,

Normally, the conversion from descriptor coordinates to image coordinates can be done using the from_feature_coords_to_image_coords(model, positions) function (c.f. example here).

Since you converted the SiLK model, I'm assuming you don't have access to the original model, and thus cannot run that function. So, in order to compute the coordinate conversion, you will first have to run this in our codebase :

# load the model version you use
model = ...

# print coordinate mapping
coord_mapping = model.coordinate_mapping_composer.get("images", "raw_descriptors")
print(- coord_mapping)

When loading the default backbone (VGGnp-4), it produces this output :

x <- tensor([1., 1.]) x + tensor([9., 9.])

This is essentially the linear mapping you have to apply to the descriptors positions to convert them to image space coordinates. You can simply hard-code it on your end.

I hope this helps.

haksorus commented 1 year ago

Thank you!