google-deepmind / tapnet

Tracking Any Point (TAP)
https://deepmind-tapir.github.io/blogpost.html
Apache License 2.0
1.26k stars 119 forks source link

Label points on multiple frames #58

Open enates499 opened 12 months ago

enates499 commented 12 months ago

I have been using the colab demos which have been great! (Specifically I adapted tapir_demo.ipynb for my own videos.) However some of my results are not as accurate as I hoped when only labeling the desired points on one frame. Can I select these points on multiple frames and thus train the dataset on more information? Thanks!

cdoersch commented 11 months ago

Not sure what you mean by 'train'. Are you asking how to fine-tune the model, or how to improve the test-time performance?

We haven't explored improving performance by selecting multiple query points along the same trajectory. In principle, you could average the feature vectors for both queries and use the average as a query (TAPIR exposes an API for getting query features and then applying them to a video). However, I don't know how much this will improve things.

Another approach is to track multiple times and average them. There are often failures due to large rotations and scaling, and often you can write an algorithm that will dynamically switch. In RoboTAP, our clustering algorithm helped with recovering after points are lost due to large scale changes.

If you have a lot of data, you can also finetune.

enates499 commented 11 months ago

Thanks @cdoersch ! I believe I mean improve test-time performance. Good to know there currently isn't a way to select multiple query points along the same trajectory. I will look into averaging, although it is more post processing than I was looking for. DeepLabCut has ways to select points on multiple frames and I have been getting better results, I had just found the TAPIR demos very easy to use. What is finetune? a function of TAPIR or a separate software?

cdoersch commented 11 months ago

DeepLabCut is intended for pose estimation of animal joint locations, whereas TAPIR is designed for surface tracking.

I'm suggesting using finetuning the same way they do: i.e., if you have some ground truth for target locations, you can train the model on that new data using the weights we've released as a warm start. Whether it will be worthwhile for you to train TAPIR in this way depends on the nature of your data; if the keypoints are more like animal skeletons, then I expect fine-tuning DeepLabCut will be more effective than fine-tuning TAPIR. If your points are on the surfaces of objects that don't have a typical "body structure" like what DeepLabCut assumes, then you may get better results with TAPIR, although it will probably be somewhat more work since we don't currently have guides on how to do it.