amaralibey / Bag-of-Queries

BoQ: A Place is Worth a Bag of learnable Queries (CVPR 2024)
MIT License
96 stars 6 forks source link

Feature dimensions of BoQ wth DinoV2 backbone #5

Closed BinuxLiu closed 4 months ago

BinuxLiu commented 4 months ago

Hi, @amaralibey BoQ is a wonderful work. My questions are: 1) What is the feature dimension of BoQ with DINOV2 used in the experiment reported in README? 2) What is the feature dimension in Table 3 of your paper? Looking forward to your reply!

BinuxLiu commented 4 months ago

Are there any experimental results in Tokyo or SF-XL? Is it because excessive dimensionality can cause memory overflow?

amaralibey commented 4 months ago

Hello @BinuxLiu,

Thank you for your interest!

  1. The feature dimension of BoQ when using the DinoV2 backbone is 12,288 (384x32).
  2. In Table 3 of our paper, we are using the ResNet50 backbone, where the dimension is 16,384.

I will run experiments on the SF-XL dataset and report the results once the tests are completed. I did not have any problem with memory when I tested on San Francisco (more than 1M images). 12288 is still 3x smaller than NetVLAD's 32,768 dim.

I currently do not have the Tokyo dataset on my computer. I will need to ask the authors for the download link, and I may add it later.

Best, Amar

BinuxLiu commented 4 months ago

Thank you for your answer.

BinuxLiu commented 4 months ago

Awesome!

2024-07-12 22:53:09   Test set: < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >
2024-07-12 22:59:39   Recalls on < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >: R@1: 98.10, R@5: 98.10, R@10: 98.73, R@100: 99.68
amaralibey commented 4 months ago

Hello @BinuxLiu,

Thank you for taking the time to test on Tokyo247. Are these results with DinoV2-BoQ with images resized to 322x322? If so, do you permit that I put these performance on the README?

BinuxLiu commented 4 months ago
2024-07-12 23:16:09   Recalls on < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >: R@1: 96.51, R@5: 97.78, R@10: 98.41, R@100: 100.00
2024-07-12 23:16:09   Finished in 0:02:59

This is the Tokyo247 results with DinoV2-BoQ with images resized to 322x322. Of course.

amaralibey commented 4 months ago

Nice, thank you. Could you please tell which model generated these results?

Awesome!

2024-07-12 22:53:09   Test set: < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >
2024-07-12 22:59:39   Recalls on < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >: R@1: 98.10, R@5: 98.10, R@10: 98.73, R@100: 99.68
BinuxLiu commented 4 months ago

I used adaptive resolution to evaluate the dataset and also used the DINO-BoQ model to test and get the first set of results. The resolution has a great impact on the results of Tokyo247 dataset. You are welcome. I am also very grateful for your previous answer. Your work are very inspiring to me.

class VPRModel(torch.nn.Module):
    def __init__(self, 
                 backbone,
                 aggregator):
        super().__init__()
        self.backbone = backbone
        self.aggregator = aggregator

    def forward(self, x):

        if not self.training:
            b, c, h, w = x.shape
            h = round(h / 14) * 14
            w = round(w / 14) * 14
            x = transforms.functional.resize(x, [h, w], antialias=True)

        x = self.backbone(x)
        x, attns = self.aggregator(x)
        return x, attns
amaralibey commented 4 months ago

@BinuxLiu,

Okay I see, thanks, Tokyo247 has high-resolution queries with varying image sizes, so it makes sense that performance improves when maintaining the original aspect ratio (we discussed this aspect in the Supplementary).

I'm glad you enjoyed our work and find it useful :) By the way, I will be launching a new framework for VPR in the coming days. It'd be great to get some feedback. I'll keep you in touch.

Best, Amar.

BinuxLiu commented 4 months ago

Hi @amaralibey Sorry, I forgot to test the performance of ResNet-50 last time. At a resolution of 322x322:

Recalls on < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >: R@1: 90.48, R@5: 94.29, R@10: 96.51, R@100: 97.78

At the original resolution:

Recalls on < BaseDataset, tokyo247 - #database: 75984; #queries: 315 >: R@1: 94.29, R@5: 96.51, R@10: 96.51, R@100: 98.41

Welcome to use my results, I guess there shouldn't be any errors. If the experimental results of SF-XL datasets are available, please let me know. Tips: Save the features of the SF-XL database during the first run to speed up experiments with multiple query sets.

LKELN commented 4 months ago

Hi,for ResNet-50 ,I get the result of 90.8 95.6 96.5 at a resolution of 384 ×384. Am I testing something wrong? @BinuxLiu

BinuxLiu commented 4 months ago

Hi, @LKELN I think your results are consistent with mine. First, different machines have some influence (very small). Second, the denominator used to calculate the recall of Tokyo 247 is smaller than that of other datasets (making it easier to notice the difference). Please compare the details of result, the difference is reasonable.

BinuxLiu commented 4 months ago

You may not have noticed that I'm reporting results at two resolutions.

LKELN commented 4 months ago

I agree that slight fluctuations are normal, the original resolution you mention is 480×640? If your original resolution is also 384 x 384, then the fluctuation is anomalous I think.