16lemoing / dot

Dense Optical Tracking: Connecting the Dots
https://16lemoing.github.io/dot
MIT License
234 stars 15 forks source link

Inference for low gpu and less number of points #13

Open zeynytu opened 6 months ago

zeynytu commented 6 months ago

Hello You have done a great work I really appriciate it ! I have been trying to run the model to track some specific points on videos but I could not figure out how to do that exactly. I tried the format

model({"video": video[None], "query_points": torch.Tensor([[[1, 15, 51]]]).cuda()}, but GPU ran out of memory. Am I doing it right or is there any other method to do this ?

16lemoing commented 6 months ago

Hi @zeynytu, our method is meant to track all pixels in a frame together. If you want to track only a few points, you have two options. Either (1) using point tracking directly --model pt, or (2) using our method to track densely and then deduce the tracks for the points you are interested in --model dot. The inference mode in both cases is "tracks_for_queries" as is done here:

https://github.com/16lemoing/dot/blob/cdee971fb0615fe3bf7b6fd19d856ea572327ec1/test_tap.py#L35

Please provide more information on your GPU setup, video length and spatial resolution if you need further assistance on the OOM errors.

zeynytu commented 6 months ago

Actually, I have a long video, and I want to track around 50 points in specific coordinates. The length of the video is not a big deal; I can trim the video into separate parts. The GPU is nvidia 3060 ti with 8 GB vRAM.

Billy-ZTB commented 4 weeks ago

Hi @zeynytu, our method is meant to track all pixels in a frame together. If you want to track only a few points, you have two options. Either (1) using point tracking directly --model pt, or (2) using our method to track densely and then deduce the tracks for the points you are interested in --model dot. The inference mode in both cases is "tracks_for_queries" as is done here:

https://github.com/16lemoing/dot/blob/cdee971fb0615fe3bf7b6fd19d856ea572327ec1/test_tap.py#L35

Please provide more information on your GPU setup, video length and spatial resolution if you need further assistance on the OOM errors.

So, in the function 'interpolate', S=H*W? image

16lemoing commented 3 weeks ago

Hi @Billy-ZTB, S is the number of initial tracks (which are then densified) so in general S<<H*W.

Billy-ZTB commented 3 weeks ago

Hi @Billy-ZTB, S is the number of initial tracks (which are then densified) so in general S<<H*W.

Thanks!