Closed libenchong closed 1 year ago
This was added to ensure that all patch sizes have equal magnitudes (norm sum of all residuals). For example, a 3x3 patch could have a smaller total residual than an 8x8 patch, yet both observe the same visual features but at different scales (aka viewing the same landmark but from different distances away). So this normalisation makes Patch-NetVLAD more scale invariant (although in practice this only becomes relevant with multi-scale matching, aka 3x3 patches compared to 8x8 patches). Note that vanilla NetVLAD doesn't do this normalisation, since it doesn't need to, since NetVLAD will always have a "patch-size" equal to the full size of the feature map.
You could remove this normalisation and theoretically get about the same performance still, this is only needed where you need to guarantee scale invariance with multiple patch sizes.
I get it. Thanks for you explain the issue so carefully!
hello, in the patchnetvlad.py's get_square_regions_from_integral() , I don't understand last code “ return feat_regions / (patch_size 2) ”. Feat_regions is already the residual of the local descriptor in the patch blocks and the k cluster centers, why do you divide feat_regions by the square of patch_size ? In other words, Could I return feat_regions directly without dividing by patch_size2?![image](https://user-images.githubusercontent.com/37505417/192134379-2128d39f-3dc6-43e3-9f42-b08446c834b7.png)