LiJunnan1992 / DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning
MIT License
529 stars 83 forks source link

Question about overfitting #23

Closed MrChenFeng closed 3 years ago

MrChenFeng commented 3 years ago

Hi,

Thanks so much for sharing your code and work! I wonder have you tried asym noise at a low ratio? I tried some different noise mode such as mixing asym and sym together, sometimes the network seems overfit quickly in the initial epochs of warmup. Do you have any suggestions about modifying the loss and regularization tricks in this condition? Actually, I'm curious and confused about the relation between noise mode and loss distribution. Any suggestions will be highly appreciated!

Best, Chen

LiJunnan1992 commented 3 years ago

Hi, Have you tried to activate the confidence penalty that is used in asym noise? Usually asym noise is easier to overfit because the noise has structure.

MrChenFeng commented 3 years ago

Hi, Actually, I added a weights hyperparameter for the confidence-regularization term. Seems it resulted in the loss distribution moved right-side as the weight get bigger but still one-peak distribution. Sadly, it didn't work.

LiJunnan1992 commented 3 years ago

Can I know what kind of noise distribution do you use? You may want to also try different warm-up epochs and see which epoch results in more separation in the loss distribution. Moreover, a larger learning rate may also help.

MrChenFeng commented 3 years ago

I would say the noise mode I tried tend to be noisier. Such as one real class has may be blended with noisy samples from two or three more other classes.

LiJunnan1992 commented 3 years ago

It might be possible that there is too much noise? From my experience, the model needs to be able to learn something during warmup in order to start noise cleaning.