facebookresearch / AVID-CMA

Audio Visual Instance Discrimination with Cross-Modal Agreement
Other
127 stars 18 forks source link

How many GPUs are needed? #3

Closed nashory closed 3 years ago

nashory commented 3 years ago

Hi, Thanks for sharing the awesome code! I wonder how many "Servers / GPUs / Days" are consumed for pretraining on Kinetics dataset. Furthermore, how do you decide if the network is fully converged in the self-supervised training phase? Thank you for your reply in advance :)

imisra commented 3 years ago

I believe @pedro-morgado can answer this best.

pedro-morgado commented 3 years ago

Hi, we've trained on 4 nodes each with 8 16Gb GPUs (32 GPUs total). The first AVID stage was trained for 200 epochs. The second CMA stage was trained for 200 more epochs. Each stage (200 epochs) took about 38 hours - roughly 11 minutes per epoch.

Re: how we decide if the model is fully converged) You can either track downstream performance or the loss value itself. In the early stages of our research, I've tracked the downstream performance of the model on UCF and found that, after roughly 200 epochs, models learned by AVID would still improve the self-supervised loss, but would not generalize better or worse, it would just remain the same. So 200 epochs is a safe number for this dataset.

carlprilo commented 3 years ago

@pedro-morgado Hi, I ran your codes recently with 32GPUs (4*8*V100). But it takes 70mins to train an epoch. I wonder if you used the same config file in the codes or you ran with modified codes?