Closed linjie-yang closed 4 months ago
I have found out the code about point-based kd implementation
Hi @linjie-yang,
I see that you've closed the issue. The code can be found when you search keywords such as kd_loss
. The implementations are conducted in different detector source codes.
Hi @linjie-yang,
I see that you've closed the issue. The code can be found when you search keywords such as
kd_loss
. The implementations are conducted in different detector source codes.
Thank you for your response. I noticed that you mentioned in your article, "Note that these layers are trained with the student simultaneously and can be discarded during inference." Does this mean that both the student model and the teacher model, as well as the dynamic graph convolutional network, need to undergo backpropagation, or does it mean freezing the original weights of the teacher model and only updating the weights of the dynamic graph convolutional network for the teacher model and all network weights for the student model?
Hi @linjie-yang,
In general KD settings, the teacher model is frozen. Only the student and the projection networks need to be updated. If you unfreeze the teacher, it is called online KD, which is different from this paper.
Hi @linjie-yang,
In general KD settings, the teacher model is frozen. Only the student and the projection networks need to be updated. If you unfreeze the teacher, it is called online KD, which is different from this paper. Got it, thanks!
Your excellent work is impressive. I have reviewed your code, and it seems that only voxel-based knowledge distillation is implemented. Have you also implemented point-based knowledge distillation? If I missed it, could you please let me know which file contains the implementation of point-based knowledge distillation?