Input shapes in forward_batch()

Hi, Thank you for your great work. I am little bit confused about the indexing operations in the forward_batch() function in co-tracker/train.py.

I think vis_g variable has the shape (B, T, N), with respect to class CoTrackerData .

So that, with this operation, you find the first_positive_inds of shape (B, N): __, first_positive_inds = torch.max(vis_g, dim=1)

Then, this one follows:

# inds of visible points in the 1st frame
nonzero_inds = [torch.nonzero(vis_g[0, :, i]) for i in range(N)]

Does not vis_g[0, :, i] correspond to the visibility in the first batch item, but not the first frame of different batch items?

After that step, rand_vis_inds is calculated and has the shape (1, N). Isn't this a problem while concatenating the [rand_vis_inds[:, :N_rand], first_positive_inds[:, N_rand:]], having different dimension 0, 1 vs N? What am I missing about the shapes?

Thank you

facebookresearch / co-tracker

Input shapes in forward_batch() #47