time for training self-supervised and pretrained models

Philip-Bachman / amdim-public

Public repo for Augmented Multiscale Deep InfoMax representation learning

MIT License

399 stars 60 forks source link

With the smaller models you can get decent results in two days on 4 V100 gpus. From Table 1 in the paper: "When we train the small model using a shorter 50 epoch schedule, it achieves 62.7% accuracy in 2 days on 4 GPUs." This is far from SoTA, but decent enough if you just want to poke at the model to see what it's doing (or to get a sense if it will work on your new task). My choices when hyperopting the architecture were all focused on minimizing turnaround time for experiment results (due to limited compute), which lead away from a standard ResNet50-type architecture, which ended up being pretty unfortunate wrt reception of this paper.

Philip-Bachman / amdim-public

time for training self-supervised and pretrained models #14