QVPR / Patch-NetVLAD

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"
MIT License
517 stars 72 forks source link

About calculating keypoints #41

Closed ShijieXue749 closed 2 years ago

ShijieXue749 commented 2 years ago

Hi, I really appreciate your work. I'm reading your code about calculating keypoint centers from patch _[./models/local_matcher.calc_keypoint_centers_frompatches], but I have got confused: In the for loop: keypoints[0, k] = ((boxes[j+(i*W), 0] + boxes[(j+(patch_size[1]-1))+(i*W), 2]) / 2) keypoints[1, k] = ((boxes[j+((i+1)*W), 1] + boxes[j+((i+(patch_size[0]-1))*W), 3]) / 2) You select 4 points in each patch to calculate keypoints, why are these points? I mean,

Could you please give me a brief explanation? Thank you in advance and look forward to your reply.

StephenHausler commented 2 years ago

The purpose of this code is to convert patches in feature space to equivalent positions in image space. Boxes, which is calculated in the function calc_receptive_boxes, contains the four corner pixel positions of each spatial position in the WxH feature map. However, because we use patch sizes that are larger than an individual 1x1 spatial element in the feature map, we had to add additional code to convert the 1x1 pixel positions into the four corner pixel positions of each patch. Specifically, we find the left-hand x position in image space for the top left spatial position in the patch. We then find the right-hand x position in image space for the rightmost spatial position in the patch. We repeat for the y positions. Once we have the four corner positions of each patch, we then find the centroid pixel position and that becomes the keypoint coordinate for the current patch.

It was confusing because our comments in calc_receptive_boxes were incorrect. Each box is actually represented by [xmin, ymin, xmax, ymax]. We also found a bug in that for loop, specifically the (i+1) in keypoints[1,k] should have just been i. Fixed in: https://github.com/QVPR/Patch-NetVLAD/commit/6005b555cf05414afac3f3c0203e22a249d05b91

This is the corrected explanation of the boxes (and after the bug fix): boxes[j+(iW), 0] is the x_min of the top left boxes[(j+(patch_size[1]-1))+(iW), 2] is the x_max of the top right boxes[j+(iW), 1] is the y_min of the top left boxes0W), 3] is the y_max of the bottom left

Hope this helps. Hopefully it also makes more sense now that we've fixed the bug and comments.

ShijieXue749 commented 2 years ago

Thank you for the detailed explanation. Now I understand.