Open PGCJ opened 1 year ago
No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.
sorry , may I misunderstand your code, but you create a new model that student model added ABF in your code, and I print this model , it has added ABF struction that could Increasing memory.
“”“ in your code,model is new and student is used. Your approach should be a mixture of student models and ABF structures, and then knowledge distillation? ############## Here is your code ############## model = ReviewKD(student, in_channels, out_channels, mid_channel) “””
No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.