HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.17k stars 395 forks source link

hyperparameters for other methods #25

Open wukailu opened 4 years ago

wukailu commented 4 years ago

Hi, I want to know about the hyperparameters that used to train other methods. Are these methods well trained?

HobbitLong commented 4 years ago

For most of the methods, I used the hyper-parameter that was used by original authors. While I suppose authors of each paper have optimized for those parameters, it also might be possible those methods are underestimated.