Closed ZSQflower closed 6 years ago
Caculate the mapping according to the stride, not the input size
OK, I understand that, Thanks. So, according SiamFC, the response map reflects the translation. The max translation is designed to be (17 -1) / 2 * total_stride = 64, not mapping to the full instance search patch. Hence, we suppose that the convolutional layer of stride 1 do not bring in translation? ~
I still can't understand that ,can you explain more details about it?
Why the backbone total stride is 8, responses_up_stride is 16?
Hi, Zhang (maybe :) ), I have a question that troubles me for two days. In the tracking phase, the response heatmap is upsampled by a ratio (response_up_stride = 16), which increases the resolution of the heatmap to 272. This resolution 272 is bigger than the resolution of the input instance patch (255). And the equation 'disp_response_input = disp_response_interp config.total_stride / config.response_up_stride' is aimed to map the response location to the input instance (of network input size 255) location, but it is quite hard to understand why ' config.total_stride / config.response_up_stride' ? Maybe
*disp_response_input = disp_response_interp config.instance_size/ config.disp_response_interp** is more understandable as a linear mapping?
Thanks @StrangerZhang