facebookresearch / co-tracker

CoTracker is a model for tracking any point (pixel) on a video.
https://co-tracker.github.io/
Other
3.91k stars 251 forks source link

About reproducing the paper #86

Open ngoductuanlhp opened 5 months ago

ngoductuanlhp commented 5 months ago

Hi @nikitakaraevv,

Thank you for your excellent work.

I have a question regarding the training pipeline. I'm currently trying to reproduce the results in Table 3 of your paper. When I trained the model from scratch on the Kubric dataset, the best evaluation result on the Tapvid Davis dataset is as follows:

"occlusion_accuracy": 0.8503666396802487 "average_jaccard": 0.5575681919643163 "average_pts_within_thresh": 0.7087581437592014 These results are significantly lower than those obtained with your provided checkpoint. I'm using Torch 2.1.0 with CUDA 12.3, and trained the model on 8 A100 GPUs with 200000 iterations, and accumulate gradient of 4 to mimic your setting.

Do you think the issue could be due to mismatched library versions, or might I be missing something else? I appreciate any guidance you can provide.

Thank you.

nikitakaraevv commented 5 months ago

Hi @ngoductuanlhp, I don't think there could be such a big gap due to mismatched library versions.

We either train it on 32 GPUs for 50k iterations or on 8 GPUs for 200k. I obtained similar performance with both settings, but 32 GPUs is slightly better. So, have you tried to train the model on 8 GPUs for 200k without gradient accumulations?

Also, how do you evaluate the model?

ngoductuanlhp commented 5 months ago

I haven't tried training the model with 200k iterations without gradient accumulations. But I did train the model with 50k iterations on 8gpus with the same learning rate of 0.0005 and the results are not good.

I use your evaluate script to evaluate on Tapvid-davis first/strided, and the dynamic replica validation.