UCSC-REAL / negative-label-smoothing

[ICML2022 Long Talk] Official Pytorch implementation of "To Smooth or Not? When Label Smoothing Meets Noisy Labels"
116 stars 10 forks source link

:bug: Instability of training under negative smoothing values #3

Closed o-laurent closed 4 days ago

o-laurent commented 2 weeks ago

Hello!

Thanks for your work! Here is what happens when I train a ResNet on CIFAR-10 using a negative smooth-rate with the following command:

python3 main_GLS_direct_train.py --smooth_rate -1.0

I have used -1.0, and - unless mistaken - your paper mentions smooth-rates up (down) to -6.0 (see Table 3).

Actual noise 0.20
over all noise rate is  0.20076
building model...
building model done
Epoch [1/200], Iter [50/390], Loss: 4.4735
Epoch [1/200], Iter [100/390], Loss: -2.9571
Epoch [1/200], Iter [150/390], Loss: -7.3548
Epoch [1/200], Iter [200/390], Loss: 10.6610
Epoch [1/200], Iter [250/390], Loss: 29.8863
Epoch [1/200], Iter [300/390], Loss: 24.9580
Epoch [1/200], Iter [350/390], Loss: -24.6937
previous_best 0.0
test acc on test images is  12.7
Epoch [2/200], Iter [50/390], Loss: -5.3613
Epoch [2/200], Iter [100/390], Loss: -5.3274
Epoch [2/200], Iter [150/390], Loss: -77.6080
Epoch [2/200], Iter [200/390], Loss: -1549.1436
Epoch [2/200], Iter [250/390], Loss: -199.6069
Epoch [2/200], Iter [300/390], Loss: -16850.7461
Epoch [2/200], Iter [350/390], Loss: -2025.1484
previous_best 12.7
test acc on test images is  13.66
Epoch [3/200], Iter [50/390], Loss: -4315.8242
Epoch [3/200], Iter [100/390], Loss: 11547.2734
Epoch [3/200], Iter [150/390], Loss: 249404.2500
Epoch [3/200], Iter [200/390], Loss: -78784.7734
Epoch [3/200], Iter [250/390], Loss: -60295.2031
Epoch [3/200], Iter [300/390], Loss: 679887.3750
Epoch [3/200], Iter [350/390], Loss: 3432846.2500
previous_best 13.66

Is it the intended behavior? If not, what command should I run to train a model with a negative smoothing rate? Ideally, I would like to use your method in the "noise-free" setting.

I have looked at Appendix D.2 of your paper, which you mentioned in #1, but I don't find information that could help reproduce Table 3. I would be completely fine with an accuracy between 88% and 92% (mentioned in D.2), but it seems that - currently - the training itself diverges. Do you think that it could be due to different PyTorch versions?

Many thanks, and have a great day!

weijiaheng commented 4 days ago

As discussed in the paper, direct training with NLS could be unstable. This is mainly because NLS relies on a relatively well-pre-trained model. As a practical implementation, the training of NLS is recommended to warm up with the hard label setting and then proceed with negative labels in the later training stage. see here

o-laurent commented 4 days ago

Hello @weijiaheng,

Thank you for your answer! I have most likely misunderstood what was written in Appendix D.2. I'll try to use negative label-smoothing when fine-tuning from a pre-trained model, and I'll let you know by re-opening this issue if I still encounter some problems.