Training Speedup for ToMe with Video

dfan commented 1 year ago

Is the finetune training time reported in Table 6 for the same number of epochs, or total wall clock time to convergence? I don't observe a noticeable reduction in training speed per iteration, however I can replicate the 2x inference speedup when r=65.

dbolya commented 1 year ago

That is total wall clock time given the same number of epochs. Note that your speed-up will heavily depend on the rest of your setup, especially during training.

Training is only as fast as your weakest link, so if you get no speed-up from ToMe that means your performance isn't dictated by the model, but some other bottleneck (dataloading, inter-node gradient sync time, etc.).

For video, we use repeated augmentation (i.e., each data sample is augmented 4 times instead of pulling 4 different video clips, this is default for mae st), which speeds up dataloading. The benchmark was also performed on just 1 (8 gpu) node to minimize gradient sync time.

dfan commented 1 year ago

Thank you for those details! That's very useful to know

dbolya commented 1 year ago

I'll close this issue for now. Feel free to reopen if you need more clarification.

facebookresearch / ToMe

Training Speedup for ToMe with Video #35