QVPR / Patch-NetVLAD

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"
MIT License
510 stars 72 forks source link

Why do feat_regions divide by (patch_size ** 2)? #63

Closed libenchong closed 1 year ago

libenchong commented 1 year ago

hello, in the patchnetvlad.py's get_square_regions_from_integral() , I don't understand last code “ return feat_regions / (patch_size 2) ”. Feat_regions is already the residual of the local descriptor in the patch blocks and the k cluster centers, why do you divide feat_regions by the square of patch_size ? In other words, Could I return feat_regions directly without dividing by patch_size2? image

StephenHausler commented 1 year ago

This was added to ensure that all patch sizes have equal magnitudes (norm sum of all residuals). For example, a 3x3 patch could have a smaller total residual than an 8x8 patch, yet both observe the same visual features but at different scales (aka viewing the same landmark but from different distances away). So this normalisation makes Patch-NetVLAD more scale invariant (although in practice this only becomes relevant with multi-scale matching, aka 3x3 patches compared to 8x8 patches). Note that vanilla NetVLAD doesn't do this normalisation, since it doesn't need to, since NetVLAD will always have a "patch-size" equal to the full size of the feature map.

You could remove this normalisation and theoretically get about the same performance still, this is only needed where you need to guarantee scale invariance with multiple patch sizes.

libenchong commented 1 year ago

I get it. Thanks for you explain the issue so carefully!