facebookresearch / co-tracker

CoTracker is a model for tracking any point (pixel) on a video.
https://co-tracker.github.io/
Other
2.52k stars 175 forks source link

Input shapes in forward_batch() #47

Closed gorkaydemir closed 5 months ago

gorkaydemir commented 7 months ago

Hi, Thank you for your great work. I am little bit confused about the indexing operations in the forward_batch() function in co-tracker/train.py.

I think vis_g variable has the shape (B, T, N), with respect to class CoTrackerData .

So that, with this operation, you find the first_positive_inds of shape (B, N): __, first_positive_inds = torch.max(vis_g, dim=1)

Then, this one follows:

# inds of visible points in the 1st frame
nonzero_inds = [torch.nonzero(vis_g[0, :, i]) for i in range(N)]

Does not vis_g[0, :, i] correspond to the visibility in the first batch item, but not the first frame of different batch items?

After that step, rand_vis_inds is calculated and has the shape (1, N). Isn't this a problem while concatenating the [rand_vis_inds[:, :N_rand], first_positive_inds[:, N_rand:]], having different dimension 0, 1 vs N? What am I missing about the shapes?

Thank you

nikitakaraevv commented 6 months ago

Hi @gorkaydemir, yes, you are right. Initially, the model only worked for a batch size of 1 due to various reasons, but yesterday, we released a new version that fixed this problem. Now you can run and train the model with different batch sizes. Please let me know if you have any other questions.