facebookresearch / silk

SiLK (Simple Learned Keypoint) is a self-supervised deep learning keypoint model.
GNU General Public License v3.0
643 stars 58 forks source link

use superpoint to inference #26

Closed tianszh-000 closed 1 year ago

tianszh-000 commented 1 year ago

hi, i want to inference with ur trained superpoint parameters here is my use of your cli_tool, `#!/bin/bash set -e

folders param

TAG=testResult_coco_1_sppoint4 #example IMAGES=/data/SiLK/dataset/case2 #$MY_IMAGES IMAGES_EXTENSION=jpg OUT=var/cli/$TAG

model / matching params

CHECKPOINT="/data/SiLK/silk/assets/tests/magicpoint/superpoint_v1.pth" TOPK=10000 SIZE=480

mkdir -p $OUT

extract features

./bin/silk-features -o -m superpoint -c $CHECKPOINT -k $TOPK -d $OUT/features -s $SIZE $IMAGES/*.$IMAGES_EXTENSION

generate image pairs to match

ls $OUT/features/*.pt | sort -V | xargs ./bin/generate-matches exhaustive -s > $OUT/matches.txt

match keypoints from generated image pairs

./bin/silk-matching -o -m double-softmax -t 0.9 $OUT/matches.txt $OUT/matches/

visualize matches

./bin/silk-viz image -o $OUT/viz $OUT/matches/*.pt`

and i keep encountered with problem like this: ERROR | main::131 - cannot load file : var/cli/testResult_coco_1_sppoint4/matches/*.pt RuntimeError: The size of tensor a (65) must match the size of tensor b (60) at non-singleton dimension 0

so is there anything wrong with the cli-tool configuration, or there are other ways to do this?

gleize commented 1 year ago

Hi @tianszh-000,

Thanks for pointing out the issue. There was indeed a bug when running the SuperPoint model, the feature output format was incorrect, which lead to the exception mentioned above.

I've pushed a fix, but it will some time to land in the Meta codebase and sync on Github. In the meantime, you can apply the fix by replacing this line by this :

        kwargs = {
            "use_batchnorm": False,
            "default_outputs": ("positions", "sparse_descriptors"),
        }
tianszh-000 commented 1 year ago

Hi @tianszh-000,

Thanks for pointing out the issue. There was indeed a bug when running the SuperPoint model, the feature output format was incorrect, which lead to the exception mentioned above.

I've pushed a fix, but it will some time to land in the Meta codebase and sync on Github. In the meantime, you can apply the fix by replacing this line by this :

        kwargs = {
            "use_batchnorm": False,
            "default_outputs": ("positions", "sparse_descriptors"),
        }

that works, thank u. and i met another problem when i tried "out-of-framework-model-evaluation" by executing ./bin/silk-cli mode=run-hpatches-tests-cached-[my-model] and the same error when i tried ur provided file by ./bin/silk-cli mode=run-hpatches-tests-cached-disk error details:

ModuleNotFoundError: Caught ModuleNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/xxx/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/xxx/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/xxx/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/xxx/silk/silk/datasets/cached.py", line 87, in __getitem__
    return pickle.loads(obj_bytes)
ModuleNotFoundError: No module named 'pixenv'

is this a bug too? or is there something wrong with my generated ".hd5" file?

gleize commented 1 year ago

Hi @tianszh-000,

Thanks for pointing that out.

There is indeed an issue with the pickle loading of cached dataset. Those datasets were saved to disk using the old codebase named pixenv, which created the above exception when loading those files in the newly renamed codebase.

A fix has been pushed here.

This issue should only affect the pickle files we do provide, it should not affect the cached dataset you created though.

tianszh-000 commented 1 year ago

u r right, there is still error after using ur bugfix here is my error when i ./bin/silk-cli mode=run-hpatches-tests-cached-model_name

`points1 = torch.vstack((points1, torch.ones(1, num_points, device=points1.device)))
              │     │       │        │     │       │                  │       └ <attribute 'device' of 'torch._C._TensorBase' objects>
              │     │       │        │     │       │                  └ tensor([[[[0.6078]],
              │     │       │        │     │       │                    
              │     │       │        │     │       │                             [[0.6078]],
              │     │       │        │     │       │                    
              │     │       │        │     │       │                             [[0.5961]],
              │     │       │        │     │       │                    
              │     │       │        │     │       │                             [[0.6000]],
              │     │       │        │     │       │                    
              │     │       │        │     │       │                             [[0.6510]],
              │     │       │        │     │       │                    
              │     │       │        │     │       │                             [[0.69...
              │     │       │        │     │       └ 480
              │     │       │        │     └ <built-in method ones of type object at 0x7fbb220e6760>
              │     │       │        └ <module 'torch' from '/data/miniconda3/envs/silk/lib/python3.8/site-packages/torch/__init__.py'>
              │     │       └ tensor([[[[0.6078]],
              │     │         
              │     │                  [[0.6078]],
              │     │         
              │     │                  [[0.5961]],
              │     │         
              │     │                  [[0.6000]],
              │     │         
              │     │                  [[0.6510]],
              │     │         
              │     │                  [[0.69...
              │     └ <built-in method vstack of type object at 0x7fbb220e6760>
              └ <module 'torch' from '/data/miniconda3/envs/silk/lib/python3.8/site-packages/torch/__init__.py'>

RuntimeError: Tensors must have same number of dimensions: got 4 and 2``

and i think there might be sth wrong with my generated hd5 file well, i want to evaluate my own superpoint model (trained on other datasets)in ur framework, and according to my understanding about ur out-framework-model-evaluation doc, i should generate a match-point dataset with my won model and ur provided cached dataset, my understanding is that ur "run-on-hpatches-model-name" script would compute keypoints with my model info, i don't know if that's right, my run-on-hpatches-model script like this, bascially ,i just pieces some components together with the stuff in ur provided scripts , but i am not sure if this is the right way

`DATASET = "/xxx/assets/datasets/cached-hpatches/hpatches-full-size-480-grey.h5"
OUTPUT_DATASET = "Spoint_all_points_480.h5"
MODEL_WEIGHT = "/xxx/assets/models/silk/analysis/alpha/spp_5_1.674.pth"

def mnn_matcher(descriptors_a, descriptors_b):
    device = descriptors_a.device
    sim = descriptors_a @ descriptors_b.t()
    nn12 = torch.max(sim, dim=1)[1]
    nn21 = torch.max(sim, dim=0)[1]
    ids1 = torch.arange(0, sim.shape[0], device=device)
    mask = ids1 == nn21[nn12]
    matches = torch.stack([ids1[mask], nn12[mask]])
    return matches.t().data.cpu().numpy()

def transform(elem, matcher):
    # extract images (single image, no batch dimension)
    original_image = elem["original_img"]
    warped_image = elem["warped_img"]

    original_image = original_image[None].cuda()
    warped_image = warped_image[None].cuda()
    batch = {"image0": original_image, "image1": warped_image}
    print("batch",batch)
    with torch.no_grad():
        matcher(batch)
        mkpts0 = batch["image0"].cpu()
        mkpts1 = batch["image1"].cpu()
        # mconf = batch["mconf"].cpu()  # noqa: F841

    # run model here
    original_points = mkpts0[..., [1, 0]]
    warped_points = mkpts1[..., [1, 0]]
    matched_original_points = mkpts0[..., [1, 0]]
    matched_warped_points = mkpts1[..., [1, 0]]

    # save keypoints
    elem = elem.add("original_points", original_points)
    elem = elem.add("warped_points", warped_points)
    elem = elem.add("matched_original_points", matched_original_points)
    elem = elem.add("matched_warped_points", matched_warped_points)

    return elem

def main():
    # load model
    state_dict = torch.load(MODEL_WEIGHT, map_location="cpu")
    print(list(state_dict.keys()))
    if "extractor" in state_dict:
        weights = state_dict["extractor"]
    # elif "disk" in state_dict:
    #     weights = state_dict["disk"]
    else:
        weights = state_dict
        # raise KeyError("Incompatible weight file!")

    model = SuperPointBNNet()
    model.load_state_dict(weights)
    model = model.cuda()
    model = model.eval()

    cache(DATASET, OUTPUT_DATASET, transform, model)

if __name__ == "__main__":
    main()
`

is there any solutions to slove this? thanks very much

gleize commented 1 year ago

Hi @tianszh-000,

[...], but i am not sure if this is the right way

Your understanding is correct, this is the right way.

is there any solutions to slove this?

Based on the exception output, it seems the shape of the keypoint positions is incorrect. Looking at your code, it seems you're using the images as keypoints :

        mkpts0 = batch["image0"].cpu()
        mkpts1 = batch["image1"].cpu()

This won't work for sure.

Once you fix the issue, make sure original_points, warped_points, matched_original_points and matched_warped_points have the proper shape ($N\times2$) and ordering of the position coordinates (yx).

tianszh-000 commented 1 year ago

hi, my script finally works, besides i have two questions about training.

  1. in terms of our training conditions with only one GPU (Quadro RTX A6000), I modified the code to train under one GPU, it runs well but i don't know if that brings some accuracy gap cause u mentioned it has to be at least two GPUs.
  2. model params files saved during training process(training with only one gpu), which one should be chosen to be the final very best model to evalaute? i used the lastest one "epoch=99-step=99999.ckpt" to evlauate on hpatches, the metrics showing worse than using "epoch=38-step=38999.ckpt", while using "epoch=38-step=38999.ckpt", metrics even better than the one u listed in ur article.
gleize commented 1 year ago

Hi @tianszh-000 ,

in terms of our training conditions with only one GPU (Quadro RTX A6000), I modified the code to train under one GPU, it runs well but i don't know if that brings some accuracy gap cause u mentioned it has to be at least two GPUs.

It shouldn't affect the performance. The two GPU requirement was more of a "hard-coded convenience" than anything else.

model params files saved during training process(training with only one gpu), which one should be chosen to be the final very best model to evalaute? i used the lastest one "epoch=99-step=99999.ckpt" to evlauate on hpatches, the metrics showing worse than using "epoch=38-step=38999.ckpt", while using "epoch=38-step=38999.ckpt", metrics even better than the one u listed in ur article.

In all of our training, we selected the last checkpoint among the best 10 (best in term of validation score). And we trained only once (i.e. we didn't train multiple times and selected the model having best results on our benchmarks). So, what you've found is interesting, but not too surprising. It's likely caused by the stochastic nature of the training. It would be interesting to measure the variance of those results over multiple trainings, and whether or not "early" checkpoints have consistently better performance than later ones.