google-deepmind / tapnet

Tracking Any Point (TAP)
https://deepmind-tapir.github.io/blogpost.html
Apache License 2.0
1.3k stars 123 forks source link

Issue in ground-truth trajectories in TAP-Vid Kinetics pickle files #65

Closed AssafSinger94 closed 2 months ago

AssafSinger94 commented 1 year ago

Hi,

When trying to evaluate TAP-IR on TAP-Vid Kinetics, the reported model accuracy is much lower than reported in the paper. I tried debugging this, and when visualizing the predictions vs ground-truth (by manually setting write_viz = True in supervised_point_prediction.SupervisedPointPrediction_eval_epoch, which runs viz_utils.write_visualization) I noticed that the ground truth trajectories are behaving very "weird", and move around the frames instead of "sticking" to a specific point (TAP-IR trajectories made much more sense and properly tracked objects). This was the case for many videos. I was able to evaluate the models on TAP-Vid DAVIS properly.

Do you have any idea what might be the cause of this? I have no clue what might be the issue.

I ran the following code (I set query_mode="first" to make the visualization clearer).

python ./tapnet/experiment.py \
--config=./tapnet/configs/tapir_config.py \
--jaxline_mode=eval_tapvid_kinetics_q_first \
--config.checkpoint_dir=./tapnet/checkpoints_tapir/ \
--config.experiment_kwargs.config.kinetics_points_path=<path_to_tapvid_kinetics_pkl_files_dir>

The directory contains files [0000_of_0010.pkl, ..., 0009_of_0010.pkl]. I generated the TAP-Vid Kinetics files by following the instructions on https://github.com/google-deepmind/tapnet/tree/main/data#downloading-and-processing-tap-vid-kinetics

(Average after 27 videos) {'average_jaccard': 0.242, 'average_pts_within_thresh': 0.334, 'jaccard_1': 0.081, 'jaccard_16': 0.402, 'jaccard_2': 0.163, 'jaccard_4': 0.246, 'jaccard_8': 0.316, 'loss_occ': 0.829, 'occlusion_accuracy': 0.745, 'position_loss': 52.873, 'pts_within_1': 0.127, 'pts_within_16': 0.560, 'pts_within_2': 0.229, 'pts_within_4': 0.329, 'pts_within_8': 0.428}

AssafSinger94 commented 1 year ago

Here are two example visualization of predictions vs ground-truth trajectories for videos 2 & 3 in TAP-Vid Kinetics (I changed plt.scatter from diamond shape to plus sign for ground-truth trajectories, to make the visualization clearer). This phenomena happens in most videos.

https://github.com/google-deepmind/tapnet/assets/43016459/dbf6567d-7ba4-4bbb-9fac-d81c4f278903

https://github.com/google-deepmind/tapnet/assets/43016459/f4dd9474-43bb-448e-8613-d4b508e6d04d

cdoersch commented 1 year ago

Unfortunately I can't see the video you've posted. Maybe try some external hosting like google drive?

cdoersch commented 1 year ago

Nevermind, I reloaded and somehow they're working now. This definitely looks like a framerate issue. Our annotations assume the tensor of video frames is extracted at 25fps, so you need to make sure the ffmpeg command you used to decode the videos into frames is set up correctly. Our script should do it, but it might not work if the framerate for the source videos is set incorrectly. It's hard to say more without knowing how you downloaded the original videos.

If you got DAVIS working, would you mind closing your other bug?

cdoersch commented 2 months ago

Others haven't been reporting the same issue, so I'm closing due to inactivity.