HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.12k stars 391 forks source link

Teacher/Student Parameter ratio #7

Closed iiSeymour closed 4 years ago

iiSeymour commented 4 years ago

Hey @HobbitLong, nice paper. Do you have any guidelines on the effectiveness of the approach as the number of parameters decrease in the student model vs the teacher (same architecture) and what a sensible ratio is a good starting point?

HobbitLong commented 4 years ago

Hi, @iiSeymour ,

This is an interesting question! I haven't got chance to look into it, but my guess is somewhere between 2~10 for teacher-student ratio.

iiSeymour commented 4 years ago

Great, thanks.