facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.
Other
957 stars 68 forks source link

Video Classification off-the-shelf model #37

Open JoakimHaurum opened 12 months ago

JoakimHaurum commented 12 months ago

Hi,

Is it possible to share your baseline ViT-L model used for the Kinetics-400 experiments? The official MAE-ST repo does not provide any fine-tuned checkpoints, and it is not feasible for me to go through the fine-tuning process at the current time.

Best, Joakim