amaralibey / Bag-of-Queries

BoQ: A Place is Worth a Bag of learnable Queries (CVPR 2024)
MIT License
86 stars 5 forks source link

Minimal example of inference #14

Open tianyilim opened 1 week ago

tianyilim commented 1 week ago

Hello, cool work! I am trying to use this in the context of loop closure detection (first trying with KITTI).

I tried to use the model as follows. Could you help to sanity check my code?

# relevant imports, include PIL, numpy etc.
from torchvision import transforms as tvf
import torch

# initialize model, load to GPU
vpr_model = torch.hub.load("amaralibey/bag-of-queries", "get_trained_boq",
                                            backbone_name="dinov2", output_dim=12288)
vpr_model.to('cuda')
boq_in_size = (322, 322)  # to be used with DinoV2 backbone
base_image_transform = tvf.Compose([
                tvf.ToTensor(),
                tvf.Resize(boq_in_size, interpolation=tvf.InterpolationMode.BICUBIC, antialias=True),
                tvf.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

# Load an actual image. replace image_full_path with your stuff
image = PIL.Image.open(image_full_path).convert("RGB")
img_pt = base_image_transform(image).to('cuda')[None, ...]

global_descriptor = vpr_model(img_pt)[0] # 2nd elem of tuple is the device
gd_np = global_descriptor.cpu().numpy()

# use gd_np for similarity
tianyilim commented 1 week ago

As a follow up, this code worked for my use-case. I just had to set a lower score threshold than for AnyLoc descriptors, but it seems to perform better.

Thanks!