HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.17k stars 395 forks source link

How to train teacher model #31

Open tiancity-NJU opened 4 years ago

tiancity-NJU commented 4 years ago

When i try to train a teacher model resnet50 on new dataset(cub200), the backbone is different with origin resnet50. one the one hand, it is too big to use big batchsize(8 is ok on 16g gpu), on the other hand, the acc1 is too low when i train 300epoch which is 10% . why?