Closed ZhonghuaYi closed 10 months ago
Hi @ZhonghuaYi,
You are correct in mentioning there is indeed a shift of 9 when using padding=0
.
The position you obtained are in feature coordinate space. To convert them to image coordinate space, we added the from_feature_coords_to_image_coords function. This will essentially look at the structure of the model, figure out the linear coordinate mapping between feature space and image space, and then apply that linear transform to the positions you obtained.
For example :
Here and here are good pointers if you need more information about it.
Thank you @gleize
I carefully reviewed the code and found the from_feature_coords_to_image_coords
function. It does the work you said. And the information in FAQ really helps.
I think this issue cloud be closed.
I have read the origin paper of SiLK, which declared that it's important to use no padding when detect keypoints.
If the
padding=1
is applied, the raised false positive corner pattern may affect the training, as mentioned in the SiLK paper. And I saw the scripts inscripts/examples
usepadding=0
for all sub-modules to avoid this, just like the paper said.But while applying
padding=0
in SiLKVGG network, the size of the probability map has shrinked. If the backbone and the detector head all have no padding, the upper left corner should shift to lower right by about 9 pixels. So as a result, the upper left corner of the probability map does not represent the real upper left point of the original image. The post-processing is therefore needed, by applying additional zero padding into the probability map, or adding relevant shifts to the points positions (add 9 if the backbone and the head have no padding).But I haven't seen any relevant post-processing in the code. In fact, when i set
padding
equals 0 or 1, the size of thedense_output
shows different. The positions all begin with [0.5, 0.5], but the last position is different. When I input an image with the size of 375x1242, the outputs are the below: padding=0:padding=1:
So I am confused that is there really needed for the post-processing for the potential position shift? Or I just misunderstand the whole problem, the position shift problem just doesn't exist?