STVIR / pysot

SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask.
Apache License 2.0
4.41k stars 1.1k forks source link

Discussion of Central Bias in Tracking with Deepnet #176

Closed LiyaoTang closed 4 years ago

LiyaoTang commented 4 years ago

After reading SiamRPN++, I am still a little bit confused by the claim "padding destroy the strict translation invariance and thus cause the network to learn central bias and thus was not able to be utilized in tracking".

Specifically, my confusion lies in

  1. why it is padding, rather than the non-linearity, that causes such a spatial bias? May I interpret in another way that the deep net has enough kernel to learn a spatial bias while the shallow net does not.

  2. Or equivalently, could you please explain a little bit of how padding contributes to the central bias? (as I would imagine to overfit any regression network by feeding only label at center to it and thus cause it to have a central bias)

Any kind of help would be greatly appreciated.

jhultman commented 4 years ago

Hi, here's my understanding. The point of "fully convolutional" is to produce features that are translationally invariant. The algorithm should not be able to "tell" where a pixel is by inspecting its feature. If the algorithm can distinguish cells near the border from cells near the center, it may begin to rely on that information even though it is just a quirk of the training set. The center-bias will not hold at inference time when the search region center is chosen based on imperfect target tracking.

If 'same' padding is used and there is no data augmentation to ensure border signal is an unreliable cue for objectness, the boundary pixels will suffer from accumulated border effects which allow the algorithm to tell them apart from the center pixels, and exploit the bias during training. The border effects are worse for deeper networks with large receptive field because the padded values creep further into the center.

convolve

LiyaoTang commented 4 years ago

Ewww, thinking in the opposite direction "to tell location from feature" really helps! Yes, it is padding who provide the possibility for such a spatial bias. Thank you so much! :)