MIV-XJTU / ARTrack

Apache License 2.0
228 stars 33 forks source link

I have a question. #48

Closed NJiHyeon closed 6 months ago

NJiHyeon commented 6 months ago

What is the meaning of search number in the data? I wonder why the size of search_images is [search_number(=35), batch, 3,256,256]. Is it to put it in like a batch?

AlexDotHam commented 6 months ago

The search number represents a number of frames we clip from the video. To sample a clip of the video, we continuously employ N=search number frames from the video to simulate inference in the video clips rather than image pairs.

NJiHyeon commented 6 months ago

If so, the size of pre_seq is [search number, prenum*4], Is it correct that the value of the tensor is as below? [[frame35_xmin, frame35_ymin, frame35_xmax, frame35_ymax, frame34_xmin, frame34_ymin, frame34_xmax, frame34_ymax ... ] [frame36_xmin, frame36_ymin, frame36_xmax, frame36_ymax, frame35_xmin, frame35_ymin, frame35_xmax, frame35_ymax ... ], ... [frame70_xmin, frame70_ymin, frame70_xmax, frame70_ymax, frame69_xmin, frame69_ymin, frame69_xmax, frame69_ymax ... ]] In both the first and second dimensions, do the values in the tensor consist of values for successive frames?

AlexDotHam commented 6 months ago

pre_seq's size consists of the N-gram we try. For example, if n-gram = 7-gram as we set, the pre_seq size is [search_number, bs, 7*4], the batch is about how many video clips we training, the search_number represents the frames in a video clip.

Moreover, as you said, the tensor in all dimensions are consist of values in successive frames.

AlexDotHam commented 6 months ago

Noticed that the pre_seq will only include the coordinates in the local coordinate system but need a coordinate system transformation across frames, because when u predict the coordinate in the next frame, the search region is cropped relying on the prediction before, and the new coordinate system is built relying on that.

NJiHyeon commented 6 months ago

I think it make the value of pre_seq in the 'explore', so can't I make the value of search_pre_seq like search_anno in the sampler part and use it?

AlexDotHam commented 6 months ago

We did not use truth trajectory sequences because we wanted to simulate the Bias generated by the tracker during inference. If you use truth, I think it can still be trained, but after our attempts, the accuracy will not improve to the current level.