Philip-Bachman / amdim-public

Public repo for Augmented Multiscale Deep InfoMax representation learning
MIT License
396 stars 60 forks source link

time for training self-supervised and pretrained models #14

Open georgecloudservices opened 4 years ago

georgecloudservices commented 4 years ago

First of all, congratulations for your work! It's really intresting.

Thanks in advance!

Philip-Bachman commented 2 years ago

With the smaller models you can get decent results in two days on 4 V100 gpus. From Table 1 in the paper: "When we train the small model using a shorter 50 epoch schedule, it achieves 62.7% accuracy in 2 days on 4 GPUs." This is far from SoTA, but decent enough if you just want to poke at the model to see what it's doing (or to get a sense if it will work on your new task). My choices when hyperopting the architecture were all focused on minimizing turnaround time for experiment results (due to limited compute), which lead away from a standard ResNet50-type architecture, which ended up being pretty unfortunate wrt reception of this paper.