HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.11k stars 389 forks source link

Ensemble Task Implementation #50

Open sdsawtelle opened 2 years ago

sdsawtelle commented 2 years ago

@HobbitLong Thank you very much for making the effort to clean and post your code for these benchmarks! I'm sure that you don't have time to post code for the ensemble distillation task, but I am going to try reproducing that benchmark so perhaps if there are any tricks or different hyperparameters settings that you can remember for that particular task off the top of your head then we can document them in this issue.

sdsawtelle commented 2 years ago

For Figure 4 in the paper, I'm wondering exactly how a single point is generated in those plots. For example, for the point that is ResNet distillation from four teachers, is that an average over multiple trials? And if so, for each trial are four new teachers trained from scratch for that trial? Or was there a pool of e.g. 8 teachers and each 4-teacher trial randomly selects four from among those 8, each 6-teacher trial randomly selects 6 from among those 8 etc?

ShristiDasBiswas commented 3 months ago

Hi, were you able to reproduce the ensemble distillation task?