Open georgecloudservices opened 4 years ago
With the smaller models you can get decent results in two days on 4 V100 gpus. From Table 1 in the paper: "When we train the small model using a shorter 50 epoch schedule, it achieves 62.7% accuracy in 2 days on 4 GPUs." This is far from SoTA, but decent enough if you just want to poke at the model to see what it's doing (or to get a sense if it will work on your new task). My choices when hyperopting the architecture were all focused on minimizing turnaround time for experiment results (due to limited compute), which lead away from a standard ResNet50-type architecture, which ended up being pretty unfortunate wrt reception of this paper.
First of all, congratulations for your work! It's really intresting.
I would be grateful if you could provide information about training time for self-supervised case with your settings ( 4 Tesla V100 or something else) for cifar10 or imagenet
In addition I think it would be very useful if there were pre-trained models for the case of self-learning, in order to use these models as representations for a dataset, for further experimentation
Thanks in advance!