dvlab-research / ReviewKD

Distilling Knowledge via Knowledge Review, CVPR 2021
248 stars 34 forks source link

A new re-implementation for KnowledgeReview #13

Open zjykzj opened 2 years ago

zjykzj commented 2 years ago

@akuxcw @littletomatodonkey Nice work !!!

Based on this repos, I tried a new implementation ZJCV/KnowledgeReview. From the training results of cifar100, KR does achieve very excellent functions. For resnet50, the distillation results even exceed the teacher network

arch_s top1 top5 arch_t top1 top5 dataset lambda top1 top5
MobileNetv2 80.620 95.820 ResNet50 83.540 96.820 CIFAR100 7.0 83.370 96.810
MobileNetv2 80.620 95.820 ResNet152 85.490 97.590 CIFAR100 8.0 84.530 97.470
MobileNetv2 80.620 95.820 ResNeXt_32x8d 85.720 97.650 CIFAR100 6.0 84.520 97.470
ResNet18 80.540 96.040 ResNet50 83.540 96.820 CIFAR100 10.0 83.130 96.350
ResNet50 83.540 96.820 ResNet152 85.490 97.590 CIFAR100 6.0 86.240 97.610
ResNet50 83.540 96.820 ResNeXt_32x8d 85.720 97.650 CIFAR100 6.0 86.220 97.490
zjykzj commented 2 years ago

By the way, the accuracy of the implementation in this paper for cifar100 is too low. Did you refer to the implementation of other warehouses for comparison :relaxed:

akuxcw commented 2 years ago

Thanks for your reimplementation!

By the way, the accuracy of the implementation in this paper for cifar100 is too low. Did you refer to the implementation of other warehouses for comparison ☺️

Do you mean the baseline results are low? I know if we change the training policies of CIFAR-100, we can achieve higher results. But many previous works on kd using the same policies in this implementation. And we follow them for better comparison.

zjykzj commented 2 years ago

Thanks for your reimplementation!

By the way, the accuracy of the implementation in this paper for cifar100 is too low. Did you refer to the implementation of other warehouses for comparison relaxed

Do you mean the baseline results are low? I know if we change the training policies of CIFAR-100, we can achieve higher results. But many previous works on kd using the same policies in this implementation. And we follow them for better comparison.

got it