some questions on paper

chenrxi commented 3 years ago

(1)the formula (4) demonstrate how the cva loss is calculated, however, it is different from how the cva loss is calculated in your code(see more in ./lib/model/losses.py). First, you maxpool attention matrix in h and w dimension, and softmax after multiplying (1-target) and index the position of the previous location in the vector as the output to calculate loss. Why do you choose the latter method. (2)In this paper, you said that you use tracking information to track. This tracking information refers to the tracking offset calculated by the CVA module. Then why did you add feat diff? If you don't add feat diff, What happened to the result?

JoJoliking commented 2 years ago

I have noticed this differences. Can you share your thinking for me ? @chenrxi

JialianW commented 2 years ago

(1) The reason why we multiply by a (1-target) is to reduce training ambiguities. 'target' is obtained by a Gaussian function. The more closely the pixel locates next to the center, the more the (1-target) approaches to zero. We want to less penalize the pixels around the previous center, as the nearby pixels could also belong to the corresponding object and should not be simply regarded as negative samples.

(2) In the paper, we stated "We optionally incorporate the residual feature as the input of to provide more motion clues". The results will slightly degrade after removing the residual feature.

JialianW / TraDeS

some questions on paper #19