bertinetto / siamese-fc

Arbitrary object tracking at 50-100 FPS with Fully Convolutional Siamese networks.
http://www.robots.ox.ac.uk/~luca/siamese-fc.html
MIT License
628 stars 224 forks source link

About exemplar image #18

Closed gzpan closed 7 years ago

gzpan commented 7 years ago

why the size of exemplar image is more than the true target? Obviously, this would bring in extra background in exemplar image besides the target in first frame. Would it influence the divergence of the siamese network? Why not just take the true target region in first frame as exemplar image?

suryafyi commented 7 years ago

In their paper, Bertinetto mentions something like

Images are scaled such that the bounding box, plus an added margin for context, has a fixed area.

My guess is that, if the target undergoes a major visual change, if not for the target itself, the neighbour context would provide clues about the existence of the target in that particular region thereby boosting the score map of that search window.

jvlmdr commented 7 years ago

At least a small amount of context should be included so that the edges of the object's boundary can be detected.

Additionally, since we do not introduce any padding in the network (i.e. all convolutions are "valid" not "same"), it is necessary to include a large amount of context so that the receptive fields of pixels in the final feature map are distributed nicely over the target.