Open Leun9 opened 1 year ago
could you please provide the training log?
Also, the version of packages may matter:
torchvision 0.8.2
timm 0.5.4
python 3.8.5
pytorch 1.7.1
Trying to replace the random seeds and assigning smaller weights to the GKD, the model seems to converge to a suboptimal solution.
Also, try to keep the same package version, which may only be slightly different but can be fatal.
Thanks for your suggestions. I will reduce the weight by half, i.e. 1.0 for CE Loss and 0.5 for GKD loss. BTW, I have discarded all deterministic options (including using a seed).
Also, try to keep the same package version, which may only be slightly different but can be fatal.
We changed the weight of GKD Loss but failed to obtain similar results.
We tried our best to reproduce it, but there was still a significant difference from the results in the paper.
Could you provide the logs mentioned in the paper or generated by this repo?
We attempted to reproduce the experimental results by directly using the code from that repository but failed.
Firstly, We trained a teacher model using the modified code, and its performance was similar to what was reported in the paper. Then, we trained a student model using
train_gkd.py
but got bad results. To verify if there were any other issues we removed the distillation loss, and we obtained a good MBF student.Here is the evaluation results.