Hello, cool work! I am trying to use this in the context of loop closure detection (first trying with KITTI).
I tried to use the model as follows. Could you help to sanity check my code?
# relevant imports, include PIL, numpy etc.
from torchvision import transforms as tvf
import torch
# initialize model, load to GPU
vpr_model = torch.hub.load("amaralibey/bag-of-queries", "get_trained_boq",
backbone_name="dinov2", output_dim=12288)
vpr_model.to('cuda')
boq_in_size = (322, 322) # to be used with DinoV2 backbone
base_image_transform = tvf.Compose([
tvf.ToTensor(),
tvf.Resize(boq_in_size, interpolation=tvf.InterpolationMode.BICUBIC, antialias=True),
tvf.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
# Load an actual image. replace image_full_path with your stuff
image = PIL.Image.open(image_full_path).convert("RGB")
img_pt = base_image_transform(image).to('cuda')[None, ...]
global_descriptor = vpr_model(img_pt)[0] # 2nd elem of tuple is the device
gd_np = global_descriptor.cpu().numpy()
# use gd_np for similarity
As a follow up, this code worked for my use-case. I just had to set a lower score threshold than for AnyLoc descriptors, but it seems to perform better.
Hello, cool work! I am trying to use this in the context of loop closure detection (first trying with KITTI).
I tried to use the model as follows. Could you help to sanity check my code?