Closed o-laurent closed 4 days ago
As discussed in the paper, direct training with NLS could be unstable. This is mainly because NLS relies on a relatively well-pre-trained model. As a practical implementation, the training of NLS is recommended to warm up with the hard label setting and then proceed with negative labels in the later training stage. see here
Hello @weijiaheng,
Thank you for your answer! I have most likely misunderstood what was written in Appendix D.2. I'll try to use negative label-smoothing when fine-tuning from a pre-trained model, and I'll let you know by re-opening this issue if I still encounter some problems.
Hello!
Thanks for your work! Here is what happens when I train a ResNet on CIFAR-10 using a negative
smooth-rate
with the following command:I have used -1.0, and - unless mistaken - your paper mentions
smooth-rates
up (down) to -6.0 (see Table 3).Is it the intended behavior? If not, what command should I run to train a model with a negative smoothing rate? Ideally, I would like to use your method in the "noise-free" setting.
I have looked at Appendix D.2 of your paper, which you mentioned in #1, but I don't find information that could help reproduce Table 3. I would be completely fine with an accuracy between 88% and 92% (mentioned in D.2), but it seems that - currently - the training itself diverges. Do you think that it could be due to different PyTorch versions?
Many thanks, and have a great day!