Closed NJiHyeon closed 6 months ago
The search number represents a number of frames we clip from the video. To sample a clip of the video, we continuously employ N=search number frames from the video to simulate inference in the video clips rather than image pairs.
If so, the size of pre_seq is [search number, prenum*4], Is it correct that the value of the tensor is as below? [[frame35_xmin, frame35_ymin, frame35_xmax, frame35_ymax, frame34_xmin, frame34_ymin, frame34_xmax, frame34_ymax ... ] [frame36_xmin, frame36_ymin, frame36_xmax, frame36_ymax, frame35_xmin, frame35_ymin, frame35_xmax, frame35_ymax ... ], ... [frame70_xmin, frame70_ymin, frame70_xmax, frame70_ymax, frame69_xmin, frame69_ymin, frame69_xmax, frame69_ymax ... ]] In both the first and second dimensions, do the values in the tensor consist of values for successive frames?
pre_seq's size consists of the N-gram we try. For example, if n-gram = 7-gram as we set, the pre_seq size is [search_number, bs, 7*4], the batch is about how many video clips we training, the search_number represents the frames in a video clip.
Moreover, as you said, the tensor in all dimensions are consist of values in successive frames.
Noticed that the pre_seq will only include the coordinates in the local coordinate system but need a coordinate system transformation across frames, because when u predict the coordinate in the next frame, the search region is cropped relying on the prediction before, and the new coordinate system is built relying on that.
I think it make the value of pre_seq in the 'explore', so can't I make the value of search_pre_seq like search_anno in the sampler part and use it?
We did not use truth trajectory sequences because we wanted to simulate the Bias generated by the tracker during inference. If you use truth, I think it can still be trained, but after our attempts, the accuracy will not improve to the current level.
What is the meaning of search number in the data? I wonder why the size of search_images is [search_number(=35), batch, 3,256,256]. Is it to put it in like a batch?