Online BootsTAPIR Weights and Config File

gorkaydemir commented 2 months ago

Hi, Thanks for your great work, BootsTAP. It seems like shared PyTorch model weights of Online BootsTAPIR is not compatible with TAPIR model, in terms of extra_convs hidden sizes. Moreover, can you share the config files of Online BootsTAPIR? Both Online and Offline BootsTAPIR config files are same at the moment.

Thanks in advance.

cdoersch commented 2 months ago

We now have a colab which demonstrates how to use it:

https://github.com/google-deepmind/tapnet/blob/main/colabs/torch_causal_tapir_demo.ipynb

bhack commented 2 months ago

@cdoersch What is the the main rational of this comment? Do you meant integrating over the pyramids? Do you have an example?

  # Take only the predictions for the final resolution.
  # For running on higher resolution, it's typically better to average across
  # resolutions.
  tracks = trajectories['tracks'][-1]
  occlusions = trajectories['occlusion'][-1]
  uncertainty = trajectories['expected_dist'][-1]

bhack commented 2 months ago

I have also another question. In a quite frequent case where we want to start from an arbitrary frame and so we need to process forward and backward to cover the sequence do we need to call both online_model_init and model.construct_initial_causal_state every time we change direction?

gorkaydemir commented 2 months ago

Hi, Thank you for the demonstration and for sharing the notebook! I have a couple of questions: Did you use this approach while evaluating the online models on the DAVIS or Kinetics datasets, as referenced in Table 7 of the paper? Unfortunately, I wasn't able to reproduce the results using the provided torch model and checkpoint. Could you offer any guidance on this?

Thank you

cdoersch commented 2 months ago

@bhack I don't see what this comment has to do with the current thread, but I'll answer anyway. During training, we apply the loss to the prediction at every layer. Therefore the model returns the final prediction as the output, and "unrefined" predictions for every iteration at every resolution. At test time, however, we find the best accuracy by taking the final refinement prediction averaged across resolutions, so that's what we return by default.

@bhack you can extract query features from later frames and then track them starting from the beginning of the video. However, it may be slightly more accurate to do it forward and backward in time, as you suggest, in which case you would need to call model.construct_initial_causal_state twice. In the current model, online_model_init can be re-used across both forward/backward runs since it only depends on the query frame.

@gorkaydemir could you provide some more information on how you're running the model, preferably the code that you're using? We use jax internally, so the pytorch port is not as well tested. You'll have to provide a way to reproduce the issue.

cdoersch commented 2 months ago

@sgjheywa FYI

bhack commented 2 months ago

@cdoersch I've posted here as that comment is in the just released pytorch online notebook and we have already also more than one ticket asking for pytorch online version and checkpoint. So it was just to not open a new one related to the new online notebook.

gorkaydemir commented 2 months ago

Hi @cdoersch, Using the evaluator and data from CoTracker, which incorporates much of your code for the TAPVid classes, I evaluated both Causal BootsTAPIR and Default BootsTAPIR using the queried-first approach. You can view the code here: Colab Notebook.

I minimized custom code and primarily utilized the scripts you provided in the notebooks. The results are reported in the last cell as a comment. While the offline BootsTAPIR performance is close to the reported values, there is a notable discrepancy between the expected and reproduced online model performances.

Thank you in advance for your assistance.

google-deepmind / tapnet

Online BootsTAPIR Weights and Config File #101