facebookresearch / VMZ

VMZ: Model Zoo for Video Modeling
Apache License 2.0
1.04k stars 156 forks source link

R(2+1)D training time on Kinetics #118

Closed elv-zhiyun closed 4 years ago

elv-zhiyun commented 4 years ago

Hi, thank you for your work and pre-trained models! I'm trying to train R(2+1)D on a custom dataset and It appears training takes a long time. In c2/tutorials/kinetics_train.md it says "Training this model may take a few days with 8 P100 GPUs" -- is this model R(2+1)D-18 8-frame as specified in c2/scripts/train_r2plus1d_kinetics.sh? Could you share more detailed information about this? How much time/computing resources does it take to train one epoch on Kinetics, and what about deeper models like R(2+1)D-34, R(2+1)D-152, or 32-frame models?

bjuncek commented 4 years ago

Hi @elv-zhiyun, the training time varies on a million different parameters, such as the compute you have available, type of interconnects, data reading system, etc.

I found that training a model on Kinetics dataset, with 64 V100 gpus (connected via InfinityBand), and NFS attached storage takes about 24 hours. This is training it for 45 epochs with epoch multiplier of 5 using pytorch DDP.

I was able to train r2+1d-18 on 8GPU machine, with slow harddrive access and basic interconnects in about 6 days, but your milage may vary.