facebookresearch / silk

SiLK (Simple Learned Keypoint) is a self-supervised deep learning keypoint model.
GNU General Public License v3.0
646 stars 58 forks source link

Backbone change #18

Closed vietpho closed 1 year ago

vietpho commented 1 year ago

Hello and thank you for your outstanding work!

As a newcomer to deep learning, I am interested in experimenting with other checkpoints beyond pvgg-4.ckpt. Specifically, I would like to use punet-disk.ckpt, but I am unsure if I need to change the backbone. Would you be able to provide an example of which backbone model would be appropriate? I have seen options such as Unet and parametric Unet, but I am uncertain which one to use. Additionally, for the coco-rgb-aug.ckpt checkpoint, I was wondering which backbone was used. Do I need to change the backbone each time I use different checkpoints?

On a related note, I have been exploring the impact of different values for the SILK_THRESHOLD parameter in your deep learning model. Specifically, I attempted to change the threshold to 0.5 and 0.8 while keeping the default inference setting (top-k = 10000), but I did not observe any significant difference compared to the default value of 1.0. Based on my understanding, the SILK_THRESHOLD parameter represents a confidence level, and I expected to see more keypoints appearing with a lower threshold value. I suppose that the images in my dataset already have more than 10000 keypoints, and I am wondering if this might be why the threshold did not seem to have a significant impact.

In addition, I was hoping you could provide some additional information about the SILK_SCALE_FACTOR parameter. Could you explain what this parameter does and how it might affect the performance of the model? And I don't know how to obtain the sparse positions and descriptors and dense positions and descriptors. I have come across the 'get_dense_positions' function, but I can't find a similar function for the sparse positions and descriptors. Additionally, I have noticed an argument labeled 'normalized_descriptor,' and I am curious as to its meaning. Is there a function available to normalize the original descriptor?

I also had a couple of questions related to your paper. Firstly, I am curious to know if you used the same matcher for SILK as you did for other models such as SIFT. Could you provide some information on the matching technique used in your experiments? Secondly, in your paper, you referred to pre-match and post-match keypoints. While I understand that the pre-match keypoints are the keypoints extracted from an image, I am unclear on how you obtained the post-match keypoints. Specifically, since the number of keypoints is already limited to the 'top-k' value during the matching process, I am not sure how you obtained post-match keypoints. Could you please elaborate on this process? Lastly, could you suggest which parameters I should tune if I want to use the results for 3D reconstruction? I've tried changing the matcher to 'double softmax' with the same value of temperature (0.1) but different threshold values. However, I don't understand why I'm getting less matches with a low threshold value. In my understanding, since the threshold value has been lowered, there should be more matches that are above the threshold value. Can you explain why it's like this?

Thank you for your time and assistance.

gleize commented 1 year ago

Hi @vietpho,

Sorry for the delay, I wanted to make sure I address all of your points.

Specifically, I would like to use punet-disk.ckpt, but I am unsure if I need to change the backbone. Would you be able to provide an example of which backbone model would be appropriate?

We make heavy use of Hydra and its auto-instantiating feature to run our controlled experiments. This means all of our backbones are instantiated from configuration files.

For example, if you want to instantiate the punet backbone, you can look at this file : etc/backbones/silk-vgg-punet.yaml. You can translate that as replacing this line by this :

SILK_BACKBONE = ParametricUNet(
    n_channels=1,
    n_classes=128,
    input_feature_channels=input_feature_channels,
    bilinear=False,
    use_max_pooling=True,
    n_scales=4,
    length=1,
    down_channels=(32, 64, 64, 64),
    up_channels=(64, 64, 64, 128),
    kernel=5,
    padding=0,
)

We understand this is not ideal for now, but I'm working on a tool that will make the backbone selection easier. I'll release that soon.

Additionally, for the coco-rgb-aug.ckpt checkpoint, I was wondering which backbone was used. Do I need to change the backbone each time I use different checkpoints

The coco-rgb-aug.ckpt uses the same backbone as pvgg-4. So you can simply replace the checkpoint in the inference script and it will work.

[...] and I am wondering if this might be why the threshold did not seem to have a significant impact.

We tend to use topk and threshold in an exclusive fashion (either one or the other).

In case both are enabled, the keypoint selection will work as follow. First use threshold to get your keypoint set. If that set is smaller than the required topk, select the topk keypoints. Otherwise, return the first set. Implementation is here.

In addition, I was hoping you could provide some additional information about the SILK_SCALE_FACTOR parameter.

This is simply a scaling factor of the descriptors (desc <- SILK_SCALE_FACTOR * desc) after normalization. This is here for legacy reasons, and has no importance whatsoever. You can just treat it as a constant.

Moreover, changing that value won't change the MNN matching, nor the ratio-test. BUT, it will affect the double-softmax matching, as changing SILK_SCALE_FACTOR is similar to changing the softmax temperature.

And I don't know how to obtain the sparse positions and descriptors and dense positions and descriptors.

You can change SILK_DEFAULT_OUTPUT as specified here to get dense outputs.

Is there a function available to normalize the original descriptor?

Raw descriptors coming from the descriptor head are L2-normalized. We simply divide them by their norm $\frac{D}{||D||}$, which makes them lie on a unit hypersphere.

Firstly, I am curious to know if you used the same matcher for SILK as you did for other models such as SIFT. Could you provide some information on the matching technique used in your experiments?

We used the default matcher recommended for each model. From memory, we used MNN for SiLK, R2D2, DISK, SIFT and SuperPoint. We use the learned matcher for LoFTR.

[...], I am not sure how you obtained post-match keypoints.

MNN (Mutual Nearest Neighbors) discards keypoints that are not mutually neighbors, which reduces the original set of keypoints coming from the model.

Lastly, could you suggest which parameters I should tune if I want to use the results for 3D reconstruction?

Tuning for :

I've tried changing the matcher to 'double softmax' with the same value of temperature (0.1) but different threshold values. However, I don't understand why I'm getting less matches with a low threshold value. In my understanding, since the threshold value has been lowered, there should be more matches that are above the threshold value. Can you explain why it's like this?

The threshold operates on distances (not similarities), that's why the logic is reversed. Here you can see how we convert matching probabilities / similarities into a distance. This is required since our MNN implementation operates on distances, and the threshold is the maximum distance allowed for matching, anything higher will be discarded.

vietpho commented 1 year ago

Thank you for your wonderful and heartfelt response!