Closed CarlHuangNuc closed 1 year ago
For MixViT-B (ViT-B or ConvMAE-B) training, we use 8 2080ti or Tesla v100 GPUs and cost about 50+ hours. For MixViT-L (ViT-L or ConvMAE-L), 8 Tesla v100 or RTX 8000 GPUs are used for training around 4+ days. In fact, maybe 300 epochs instead of 500 epochs as reported in our paper are enough for training, which can save the training time.
Thanks,
Hi,