Open phongnhhn92 opened 1 year ago
Yes, the released model weights are for S=8. For longer tracking, you need to chain the model over time. There is code for this in chain_demo.py
Hi @aharley, thanks for your reply ! I have another question: How can I obtain the dense optical flow output similar to RAFT ? In your method, in both example, we need to choose a small amount of points to perform tracking. In my case, I need to obtain the trajectories of every pixels in the starting frame.
In the part where you specify the start locations: https://github.com/aharley/pips/blob/main/demo.py#L31-L37
change it to a dense grid, like this:
grid_y, grid_x = utils.basic.meshgrid2d(B, H, W, stack=False, norm=False, device='cuda')
xy = torch.stack([grid_x, grid_y], dim=-1) # B, H*W, 2
If you run out of memory when trying to run the model for this many particles, split the list into batches of a good size for your GPU, like this: https://github.com/aharley/pips/blob/main/test_on_davis.py#L111-L125
Thanks for the pointer ! I will close the issue now.
Hi @aharley , I have a further question: in the davis code, I see that you only predict flows of the entire image using a sequence of 8 frames. However, in the chain_demo.py, you show an example of a single tracked pixels.
I wonder if you have tried to extend the chain_demo.py with dense predictions ? I assume the confidence score thresholding here is important to make sure the trajectory is correct.
Tracking all pixels is generally very memory-heavy, and pairing that with the chaining is tricky but doable. The tricky part is: the chaining technique allows each target to choose variable-length step sizes for re-initializing the tracker (within the S=8
, by looking at the confidence/visibility score like you said), so you need some clever bookkeeping to parallelize as much as possible. I think you can aim for a model that runs at most K forward passes, where K is the number of frames times the amount you need to serialize (e.g., on a 80G gpu maybe no serialization will be necessary, but on a 12G gpu maybe you do 4 forward passes to get all the pixels' trajectories).
Hi, I also wanted to figure out how to track multiple points in a frame for a longer duration. I've run demo.py and chain_demo.py and as per my understanding, demo takes a grid of points and the chain_demo takes only one point. I would like to run it on longer sequences with different data to make some sense of the outputs. Can we change either of these files to work it out?
Hi @phongnhhn92 , you had any luck running chain_demo.py with multiple points simultaneously and create a single gif?
Hi,
In the demo.py file, when I tried to change S = 8 to S = 10 then the model doesn't work. Did you hard-coded the model to only works if there are only 8 input frames each time ?