Closed iiSeymour closed 4 years ago
Hey @HobbitLong, nice paper. Do you have any guidelines on the effectiveness of the approach as the number of parameters decrease in the student model vs the teacher (same architecture) and what a sensible ratio is a good starting point?
Hi, @iiSeymour ,
This is an interesting question! I haven't got chance to look into it, but my guess is somewhere between 2~10 for teacher-student ratio.
Great, thanks.
Hey @HobbitLong, nice paper. Do you have any guidelines on the effectiveness of the approach as the number of parameters decrease in the student model vs the teacher (same architecture) and what a sensible ratio is a good starting point?