aharley / pips

Particle Video Revisited
MIT License
571 stars 51 forks source link

Tracking more than 8 frames per sequence #10

Open phongnhhn92 opened 1 year ago

phongnhhn92 commented 1 year ago

Hi,

In the demo.py file, when I tried to change S = 8 to S = 10 then the model doesn't work. Did you hard-coded the model to only works if there are only 8 input frames each time ?

aharley commented 1 year ago

Yes, the released model weights are for S=8. For longer tracking, you need to chain the model over time. There is code for this in chain_demo.py

phongnhhn92 commented 1 year ago

Hi @aharley, thanks for your reply ! I have another question: How can I obtain the dense optical flow output similar to RAFT ? In your method, in both example, we need to choose a small amount of points to perform tracking. In my case, I need to obtain the trajectories of every pixels in the starting frame.

aharley commented 1 year ago

In the part where you specify the start locations: https://github.com/aharley/pips/blob/main/demo.py#L31-L37

change it to a dense grid, like this:

    grid_y, grid_x = utils.basic.meshgrid2d(B, H, W, stack=False, norm=False, device='cuda')
    xy = torch.stack([grid_x, grid_y], dim=-1) # B, H*W, 2

If you run out of memory when trying to run the model for this many particles, split the list into batches of a good size for your GPU, like this: https://github.com/aharley/pips/blob/main/test_on_davis.py#L111-L125

phongnhhn92 commented 1 year ago

Thanks for the pointer ! I will close the issue now.

phongnhhn92 commented 1 year ago

Hi @aharley , I have a further question: in the davis code, I see that you only predict flows of the entire image using a sequence of 8 frames. However, in the chain_demo.py, you show an example of a single tracked pixels.

I wonder if you have tried to extend the chain_demo.py with dense predictions ? I assume the confidence score thresholding here is important to make sure the trajectory is correct.

aharley commented 1 year ago

Tracking all pixels is generally very memory-heavy, and pairing that with the chaining is tricky but doable. The tricky part is: the chaining technique allows each target to choose variable-length step sizes for re-initializing the tracker (within the S=8, by looking at the confidence/visibility score like you said), so you need some clever bookkeeping to parallelize as much as possible. I think you can aim for a model that runs at most K forward passes, where K is the number of frames times the amount you need to serialize (e.g., on a 80G gpu maybe no serialization will be necessary, but on a 12G gpu maybe you do 4 forward passes to get all the pixels' trajectories).

dkhanna511 commented 1 year ago

Hi, I also wanted to figure out how to track multiple points in a frame for a longer duration. I've run demo.py and chain_demo.py and as per my understanding, demo takes a grid of points and the chain_demo takes only one point. I would like to run it on longer sequences with different data to make some sense of the outputs. Can we change either of these files to work it out?

dkhanna511 commented 1 year ago

Hi @phongnhhn92 , you had any luck running chain_demo.py with multiple points simultaneously and create a single gif?