WangYZ1608 / Knowledge-Distillation-via-ND

The official implementation for paper: Improving Knowledge Distillation via Regularizing Feature Norm and Direction
13 stars 2 forks source link
computer-vision imagenet knowledge-distillation pytorch

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Pytorch implementation for the paper: Improving Knowledge Distillation via Regularizing Feature Norm and Direction.

0. Framework

The ND loss, that regularizes the Norm and Direction of the student features, be applyed the embedding features, which is defined as the output at the penultimate layer before logits.

1. Main Results

1.1 CIFAR-100

Teacher
Student
ResNet-56
ResNet-20
WRN-40-2
WRN-40-1
ResNet-32x4
ResNet-8x4
ResNet-50
MobileNet-V2
ResNet-32x4
shuffleNet-V1
ResNet-32x4
shuffleNet-V2
Teacher 72.34 75.61 79.42 79.34 79.42 79.42
Student 69.06 71.98 72.50 64.60 70.50 71.82
KD 70.66 73.54 73.33 67.65 74.07 74.45
DIST 71.78 74.42 75.79 69.17 75.23 76.08
DKD 71.97 74.81 75.44* 70.35 76.45 77.07
ReviewKD 71.89 75.09 75.63 69.89 77.45 77.78
KD++ 72.53(+1.87) 74.59(+1.05) 75.54(+2.21) 70.10(+2.35) 75.45(+1.38) 76.42(+1.97)
DIST++ 72.52(0.74) 75.00(+0.58) 76.13(+0.34) 69.80(+0.63) 75.60(+0.37) 76.64(+0.56)
DKD++ 72.16(+0.19) 75.02(+0.21) 76.28(+0.84) 70.82(+0.47) 77.11(+0.66) 77.49(+0.42)
ReviewKD++ 72.05(+0.16) 75.66(+0.57) 76.07(+0.44) 70.45(+0.56) 77.68(+0.23) 77.93(+0.15)

** represents our reproduced based on the official code DKD.

1.2 ImageNet-1k

T $\rightarrow$ S T (S) CRD SRRL ReviewKD KD DKD KD++ ReviewKD++ DKD++
R34 $\rightarrow$ R18 73.31 (69.76) 71.17 71.73 71.62 70.66 71.70 71.98 71.64 72.07
R50 $\rightarrow$ MV1 76.16 (68.87) 71.37 72.49 72.56 70.50 72.05 72.77 72.96 72.63

2. Training and Evaluation

2.1 CIFAR-100 Classification

Please refer to CIFAR for more details.

2.2 ImageNet Classification

Please refer to ImageNet for more details.

2.3 COCO Detection

Please refer to Detection for more details.

3. Citation

If you use ND in your research, please consider citing:

@misc{wang2023improving,
      title={Improving Knowledge Distillation via Regularizing Feature Norm and Direction}, 
      author={Yuzhu Wang and Lechao Cheng and Manni Duan and Yongheng Wang and Zunlei Feng and Shu Kong},
      year={2023},
      eprint={2305.17007},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}