dvlab-research / ReviewKD

Distilling Knowledge via Knowledge Review, CVPR 2021
248 stars 34 forks source link

Have you tested the size of the student(AFB) model? Can the student model still be smaller than the original teacher model? #19

Open PGCJ opened 1 year ago

akuxcw commented 1 year ago

No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.

PGCJ commented 1 year ago

No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.

sorry , may I misunderstand your code, but you create a new model that student model added ABF in your code, and I print this model , it has added ABF struction that could Increasing memory.

“”“ in your code,model is new and student is used. Your approach should be a mixture of student models and ABF structures, and then knowledge distillation? ############## Here is your code ############## model = ReviewKD(student, in_channels, out_channels, mid_channel) “””