HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.11k stars 389 forks source link

teacher model is too big to run with batch_size 64 #30

Open tiancity-NJU opened 3 years ago

tiancity-NJU commented 3 years ago

when I try to train a teacher model on cub200(200 classes), I use resnet50 and batch size 64, It will out of memory, I use 16G GPU. I could run when i set the batch size 8. Why resnet50 is so big ?