Have you tested the size of the student(AFB) model? Can the student model still be smaller than the original teacher model?

No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.

sorry , may I misunderstand your code, but you create a new model that student model added ABF in your code, and I print this model , it has added ABF struction that could Increasing memory.

“”“ in your code，model is new and student is used. Your approach should be a mixture of student models and ABF structures, and then knowledge distillation? ############## Here is your code ############## model = ReviewKD(student, in_channels, out_channels, mid_channel) “””

dvlab-research / ReviewKD

Have you tested the size of the student(AFB) model? Can the student model still be smaller than the original teacher model? #19