Open zeynytu opened 6 months ago
Hi @zeynytu, our method is meant to track all pixels in a frame together. If you want to track only a few points, you have two options. Either (1) using point tracking directly --model pt
, or (2) using our method to track densely and then deduce the tracks for the points you are interested in --model dot
. The inference mode in both cases is "tracks_for_queries" as is done here:
https://github.com/16lemoing/dot/blob/cdee971fb0615fe3bf7b6fd19d856ea572327ec1/test_tap.py#L35
Please provide more information on your GPU setup, video length and spatial resolution if you need further assistance on the OOM errors.
Actually, I have a long video, and I want to track around 50 points in specific coordinates. The length of the video is not a big deal; I can trim the video into separate parts. The GPU is nvidia 3060 ti with 8 GB vRAM.
Hi @zeynytu, our method is meant to track all pixels in a frame together. If you want to track only a few points, you have two options. Either (1) using point tracking directly
--model pt
, or (2) using our method to track densely and then deduce the tracks for the points you are interested in--model dot
. The inference mode in both cases is "tracks_for_queries" as is done here:https://github.com/16lemoing/dot/blob/cdee971fb0615fe3bf7b6fd19d856ea572327ec1/test_tap.py#L35
Please provide more information on your GPU setup, video length and spatial resolution if you need further assistance on the OOM errors.
So, in the function 'interpolate', S=H*W?
Hi @Billy-ZTB, S
is the number of initial tracks (which are then densified) so in general S<<H*W
.
Hi @Billy-ZTB,
S
is the number of initial tracks (which are then densified) so in generalS<<H*W
.
Thanks!
Hello You have done a great work I really appriciate it ! I have been trying to run the model to track some specific points on videos but I could not figure out how to do that exactly. I tried the format
model({"video": video[None], "query_points": torch.Tensor([[[1, 15, 51]]]).cuda()},
but GPU ran out of memory. Am I doing it right or is there any other method to do this ?