Closed psandovalsegura closed 7 months ago
I think it's possible to use https://drive.google.com/drive/folders/1KsH_MIZIdgjZpUZBmR4P88yeYDqM8yNW?usp=sharing which presents ARTrack-B{256} trained on COCO GOT10K LaSOT and TrackingNet. The SOT is a general challenge which is a class-ignostic task, so I think an ARTrack-B{256} is compatible with a new test.
Moreover, you can trained our ARTrack on your new test sets, but do not eliminate the COCO et. datasets in training.
Thank you for the information. I will try the checkpoint you provided.
Hi Yifan,
I have setup the ARTrackSeq_ep60..pth.tar
checkpoint and have created a new dataset class in lib/test/evaluation
. However, the results on my driving dataset look wrong. I think this might be due to the frame resolution I'm using?
My frames are 960 W x 540 H. Would that be a problem? How do you recommend I modify the code so that I can run inference on these frames?
I think the resolution may not be the reason, in GOT10k, there are diverse resolution formats. And following the pytracking, the search and template will be cropped from frames and resize into specific resolution. If you can show me the results you provide in your own datasets, I can give you some suggestions about the reason.
I see. In that case, it must be my subclass of BaseDataset
. Can you explain what should be in each Sequence
object? Right now I am only passing in ground_truth_rect
as an np.array of shape (N, 4), where every row is [x_min, y_min, x_max, y_max], where it is top left corner of bbox followed by bottom right corner. I visualized my template image (z_patch_arr
) from the first frame and it is not correct.
This is what I am currently doing:
Sequence(name=sequence_name,
frames=frames_files, # paths to N .jpg frames
dataset='ds-name',
ground_truth_rect=ground_truth_rect) # (N, 4) np array where every row is [x_min, y_min, x_max, y_max]
# object_ids=track_ids, # not using multiple object ids yet
# multiobj_mode=False)# is multiobj mode supported? None of the sample datasets use this
In other words, more documentation for every param of Sequence
would be very helpful. In particular, how to structure ground_truth_rect
. Thank you!
The ground_truth_rect maybe the np.array of (N, 4) but the every row is [x_min, y_min, w, h] or [center_x, center_y, w, h]. You can try this, and I am not sure about that param, but I am sure that the box is not [x_min, y_min, x_max, y_max].
That fixed it! The results look reasonable now. I was using [x_min, y_min, x_max, y_max] since that was mentioned in Section 3.1 of paper. But [top left x, top left y, w, h] worked. Thank you.
One more question: how does multiobj_mode change the way ground_truth_rect
should be structured? My video has some object ids that only appear after many frames later and disappear. And in the code it seems the type should be (dict, OrderedDict)
but the code doesn't explain what it represents.
I am so sorry i don't know about that, but I think it is useful to reference GMOT(https://arxiv.org/abs/2212.11920), this paper present a dataset like u promote, the code is presented in https://github.com/visionml/pytracking.
I'll check out pytracking. Thanks!
Is there a checkpoint I can use off-the-shelf to evaluate on a new car tracking test set I have?
In other words, do you have a checkpoint you expect to work well on new datasets? Or is it recommended to train my own model on car tracking training data?
Thanks for your help. ARTrack is an interesting approach!