Input format for `tracks_for_queries` mode in the model

16lemoing / dot

Dense Optical Tracking: Connecting the Dots

https://16lemoing.github.io/dot

MIT License

234 stars 15 forks source link

Input format for `tracks_for_queries` mode in the model #5

Closed justachetan closed 8 months ago

justachetan commented 9 months ago

Hi! Thank you for releasing the code and models publicly! I am trying to use the model to perform inference on my own videos. For visualization, I want to focus on a few selected query points.

The tracks_for_queries mode here seems to be what I need. However, I cannot figure out the required format of the query_points. Could you kindly provide some information about the same?

Thanks!

16lemoing commented 9 months ago

Hi, the shape for queries is (B, N, 3) with B the batch size and N the number of points. The three channels are in the format (t, y, x), t is the time step of each query, and (x, y) is the position, given in pixels with the following orientation:

+----------> X

v Y

justachetan commented 9 months ago

Thanks for the quick response! I tried running this by editing the function call in demo.py as follows:

pred = model({"video": video[None], "query_points": torch.Tensor([[[t, y, x]]]).cuda()}, mode="tracks_for_queries", **vars(args))

However I got the following error:

Traceback (most recent call last):
  File "work/dot/demo2.py", line 312, in <module>    main(args)
  File "work/dot/demo2.py", line 304, in main
    data["tracks"] = data["tracks"].permute(0, 2, 1, 3)
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 3 is not equal to len(dims) = 4

Could you kindly advise how to fix this? Thanks!

16lemoing commented 9 months ago

Hi! You get this error because sparse tracks and dense tracks do not have the same shape:

Dense tracks returned by "tracks_from_first_to_every_other_frame" mode have the shape [B T H W 3] (with B: batch size, T: time steps, H: height, W: width)
Sparse tracks returned by "tracks_for_queries" mode have shape [B T N 3] (with N the number of tracks).

The demo was written to handle dense tracks. You may try to hack it by adding another dimension -> [B T N 1 3].

16lemoing commented 8 months ago

The plotting functions in the demo can now handle tracks in both [B T H W 3] and [B T N 3] format. So there is no need for a hack anymore.