RunpeiDong / PointDistiller

[CVPR 2023] PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection
https://arxiv.org/abs/2205.11098
MIT License
66 stars 1 forks source link

point-based knowledge distillation implementation #9

Closed linjie-yang closed 4 months ago

linjie-yang commented 4 months ago

Your excellent work is impressive. I have reviewed your code, and it seems that only voxel-based knowledge distillation is implemented. Have you also implemented point-based knowledge distillation? If I missed it, could you please let me know which file contains the implementation of point-based knowledge distillation?

linjie-yang commented 4 months ago

I have found out the code about point-based kd implementation

RunpeiDong commented 4 months ago

Hi @linjie-yang,

I see that you've closed the issue. The code can be found when you search keywords such as kd_loss. The implementations are conducted in different detector source codes.

linjie-yang commented 4 months ago

Hi @linjie-yang,

I see that you've closed the issue. The code can be found when you search keywords such as kd_loss. The implementations are conducted in different detector source codes.

Thank you for your response. I noticed that you mentioned in your article, "Note that these layers are trained with the student simultaneously and can be discarded during inference." Does this mean that both the student model and the teacher model, as well as the dynamic graph convolutional network, need to undergo backpropagation, or does it mean freezing the original weights of the teacher model and only updating the weights of the dynamic graph convolutional network for the teacher model and all network weights for the student model?

RunpeiDong commented 4 months ago

Hi @linjie-yang,

In general KD settings, the teacher model is frozen. Only the student and the projection networks need to be updated. If you unfreeze the teacher, it is called online KD, which is different from this paper.

linjie-yang commented 4 months ago

Hi @linjie-yang,

In general KD settings, the teacher model is frozen. Only the student and the projection networks need to be updated. If you unfreeze the teacher, it is called online KD, which is different from this paper. Got it, thanks!