facebookresearch / ToMe

A method to increase the speed and lower the memory footprint of existing vision transformers.
Other
931 stars 67 forks source link

Training Speedup for ToMe with Video #35

Closed dfan closed 11 months ago

dfan commented 11 months ago

Is the finetune training time reported in Table 6 for the same number of epochs, or total wall clock time to convergence? I don't observe a noticeable reduction in training speed per iteration, however I can replicate the 2x inference speedup when r=65.

dbolya commented 11 months ago

That is total wall clock time given the same number of epochs. Note that your speed-up will heavily depend on the rest of your setup, especially during training.

Training is only as fast as your weakest link, so if you get no speed-up from ToMe that means your performance isn't dictated by the model, but some other bottleneck (dataloading, inter-node gradient sync time, etc.).

For video, we use repeated augmentation (i.e., each data sample is augmented 4 times instead of pulling 4 different video clips, this is default for mae st), which speeds up dataloading. The benchmark was also performed on just 1 (8 gpu) node to minimize gradient sync time.

dfan commented 11 months ago

Thank you for those details! That's very useful to know

dbolya commented 11 months ago

I'll close this issue for now. Feel free to reopen if you need more clarification.