Closed DeepLearningHB closed 2 years ago
Hi!
We believe it's more convincing to reproduce the result on large-scale dataset, such as ImageNet, so we choose to organize the ImageNet training code and upload.
If you had problems when running on Cifar100 and preserved the training logs (better contraining all training hyper-params, training loss/acc),you can provide them to me and we can check where went wrong.
Thanks for your attention!
Thanks for your sharing. But I mee the same problem too on CIFAR100. I use the code and the hyper-parameter you shared. For instance, resnet32x4-resnet8x4 is 75.07%, but the result on your paper is 76.05%. I don't know how to address this issue. Can you give me some suggestions about the hyper-parameter or code on CIFAR 100?
Thanks!
@songshucode Hi,Thanks for your attention! Averaged over 5 runs? For Cifar-100, the alpha is set to 2.25, and the temperature is 4. Note there is a default 4^2 loss weight, so the total loss weight for wsl is 2.25*4^2=36.
@woshichase Thanks for your reply. The results of 5 runs are as follows.
test 1 | test 2 | test 3 | test 4 | test 5 |
---|---|---|---|---|
75.15 | 75.15 | 75.03 | 75.18 | 74.85 |
The code about the loss function is copied from your sharing code, they are as follows:
fc_t = logits
out = self.student(x)
s_input_for_softmax = out / self.temperature
t_input_for_softmax = fc_t / self.temperature
t_soft_label = self.softmax(t_input_for_softmax)
softmax_loss = - torch.sum(t_soft_label * self.logsoftmax(s_input_for_softmax), 1, keepdim=True)
out_auto = out.detach()
fc_t_auto = fc_t.detach()
log_softmax_s = self.logsoftmax(out_auto)
log_softmax_t = self.logsoftmax(fc_t_auto)
one_hot_label = F.one_hot(y, num_classes=100).float()
softmax_loss_s = - torch.sum(one_hot_label * log_softmax_s, 1, keepdim=True)
softmax_loss_t = - torch.sum(one_hot_label * log_softmax_t, 1, keepdim=True)
focal_weight = softmax_loss_s / (softmax_loss_t)
ratio_lower = torch.zeros(1).cuda()
focal_weight = torch.max(focal_weight, ratio_lower)
focal_weight = 1 - torch.exp(- focal_weight)
softmax_loss = focal_weight * softmax_loss
soft_loss = (self.temperature ** 2) * torch.mean(softmax_loss)
hard_loss = self.hard_loss(out, y)
loss = hard_loss + self.alpha * soft_loss
where, the alpha = 2.25, temperature=4.0 Thanks for your attention!
@songshucode Hi, Thanks for your sharing. I currently find no flaws in your loss function code or hyper-params. I suggest you can make efforts such as on the training repo selection.To keep consistency with the ImageNet experiment, we also run Cifar-100 on OverHaul repo instead of CRD repo and move the CRD Cifar-100 training settings to Overhaul. Also, the pretrained teachers are re-trained on Overhaul with the same settings as training the student.
Any further comments?
Hi there, I'm trying to use your method on CIFAR-100. However, I cannot reproduce your performance even if I followed your script and hyper-parameter settings. for instance, ResNet110-ResNet32 pair showed 74.12% on your paper but in my implementation they showed only 72.91. I was able to reproduce your performance with respect to only resnet56-resnet20 (72.01 / 72.15) I think it's quite high performance gap between yours and mine. In addition, your repository only contains ImageNet training script. If you don't mind uploading CIFAR-100 training script, I may train your method ..
Thanks!