gmberton / deep-visual-geo-localization-benchmark

Official code for CVPR 2022 (Oral) paper "Deep Visual Geo-localization Benchmark"
MIT License
186 stars 28 forks source link

question about aggregation #17

Closed Kaacoinnn closed 1 year ago

Kaacoinnn commented 1 year ago

Hi, @gmberton @ga1i13o I wonder that If I don't want to use any aggregation in the program and only want to use the features output by Backbone as image representation for matching, how should I modify it? What suggestions do you have for not using the aggregation?

gmberton commented 1 year ago

Hi, that depends on if you want to use the features taken before the global average pooling or after:

  1. if you want to take them before the global average pooling you can replace the aggregation with a simple Flatten. So basically you can comment the lines defining the aggregation (between here and here) and you can simply set self.aggregation = Flatten(). Note that those features are quite heavy, as they have dimension CxHxW, and the Flatten will reshape them into a single array.

  2. or perhaps you want to take the features after the global average pooling (as most CNNs end with a global average pooling). If you need this, you can simply pass --aggregation=spoc. Note that SPOC is essentially the same as an average pooling, and it will make the CxHxW features into a C-dimensional array. You can also remove the L2 normalization by setting --l2=none, or set it before or after the pooling with l2=before_pool or l2=after_pool.