google-research / mixmatch

Apache License 2.0
1.13k stars 163 forks source link

Running on google colab / cifar 10 dataset - long training time #9

Closed Natibus closed 5 years ago

Natibus commented 5 years ago

Hello, I'm trying to run your project on google colab. I ran your example line after the setup : CUDA_VISIBLE_DEVICES=0 python mixmatch.py --filters=32 --dataset=cifar10.3@250-5000 --w_match=75 --beta=0.75 I can see on stdout that there are 1024 epochs, each taking about 5 minutes for a speed of about 180 images / sec. That leads to a total experiment time of 85 hours. Is it normal for the experiment to be that long on a Colab (K80) GPU ?

david-berthelot commented 5 years ago

Not sure, I didn't run on K80. I typically run on a V100 which if I understand correctly is roughly the same speed as a RTX2080 TI. It takes roughly 20 hours for full training on such a GPU.

Note that for the purpose of quick experimentation you don't really have to train that long, even after 1/4th of the training has elapsed the network starts to make good predictions.

Natibus commented 5 years ago

Thank you, i'll stop the training earlier then 😊 This page shows that the K80 seems ~4 times slower than V100 in TFLOPS, I guess it's coherent.