Results on ImageNet - Githubissues

Thanks for your great work.

When I conduct experiments on ImageNet, I use the same training hyper-parameters provided in PyTorch official examples.. The initial learning rate is 0.1 and it is decayed at 30,60 epoch respectively. I find that in first two stages, i.e. 1-30 epoch and 31-60 epoch, the standard KD has a higher accuracy than the student trained from scratch. But in the 3rd stage (61-90 epoch), KD's accuracy is lower than student trained from scratch. This phenomenon is exactly the same as that in the Figure 3 of this paper.

In your work, KD's top1 accuracy is 0.9 points better than student trained from scratch. I wonder if there are some special processes such as training scheme or hyper-parameters that are different from that in PyTorch official examples. It would be much better if you can provide your code for ImageNet.

Thanks in advance!

HobbitLong / RepDistiller

Results on ImageNet #10