Closed ksachdeva closed 3 years ago
Hi, @ksachdeva,
It's a bit messy in evaluations of some previous papers. Some of them directly combine KD with their own object to claim outperformance over KD, while some others directly compare their single objective with KD.
The table you see here compares each single distillation objective without combining KD. As for the results of combining KD, please refer to the appendix.
Hi,
Maybe I am not correctly understanding these below-mentioned tables (copied from your README at the root of the repo) but it seems that for almost all the configurations KD worked better than all the methods except the one proposed by you.
The purpose of all these methods originally was to improve upon KD (Hinton et al) so I am very surprised by this table.
Please guide
Regards & thanks Kapil
Student
wrn-16-2
wrn-40-1
resnet20
resnet20
resnet32
resnet8x4
vgg8
Student
73.26
71.98
69.06
69.06
71.14
72.50
70.36
Student
MobileNetV2
MobileNetV2
vgg8
ShuffleNetV1
ShuffleNetV2
ShuffleNetV1
Student
64.60
64.60
70.36
70.50
71.82
70.50