Yikai-Wang / SPR-LNL

This is the official repo for our CVPR22 paper: Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels.
17 stars 2 forks source link

Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels

[paper] [intro]

Overview

This is the official repo for our CVPR22 paper: Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels.

SPR is a theoretically guaranteed noisy label detection framework to detect and remove noisy data for learning with noisy labels. It produces a penalized regression to model the linear relation between network features and one-hot labels, where the noisy data are identified by the non-zero mean shift parameters solved in the regression model. A non-asymptotic probabilistic condition for SPR is provided to correctly identify the noisy data. SPR can also be combined with semi-supervised algorithm to further exploit the support of noisy data as unlabeled data.

Requirements

python==3.7.6
numpy==1.19.1
scipy==1.6.0
scikit-learn==0.23.2
torch==1.5.1
torchvision==0.6.0a0+35d732a

Data Preparing

MNIST and CIFAR-10 can be downloaded using torchvision. The other two datasets can be downloaded from the official link: ANIMAL10, WebVision.

The datasets are expected to be stored in the folder ../data or specified by the root parameter, and arranged as follows:

│data/
├── MNIST/
│   ├── ......
├── CIFAR10/
│   ├── ......
├── animal10/
│   ├── training/
│   │   ├── ......
│   ├── testing/
│   │   ├── ......
├── webvision/
│   ├── info/
│   │   ├── ......
│   ├── google/
│   │   ├── ......
│   ├── val_images_256/
│   │   ├── ......
(Optional)
├── imagenet/
│   ├── meta.mat
│   ├── ILSVRC2012_validation_ground_truth.txt
│   ├── val/
│   │   ├── ......

Pretrained Model

The pretained models can be downloaded from here and should be put in the folder ckpt.

Training

Example training commands are listed in the folder scripts. You could try the following commands as a start.

Note: To train with SPR but without using CutMix, you should set --cutmix 1 and --cutmix_prob 0.

Train SPR on MNIST with different noise setting:

python scripts/train_mnist.py

Train SPR on CIFAR10 with different noise setting:

python scripts/train_cifar.py

Train SPR on Animal10:

python scripts/train_animal.py

Train SPR on WebVision:

python scripts/train_webvision.py

Evaluation

Example evaluation commands are listed in the folder scripts. You could try the following commands as a start.

Test SPR on MNIST with different noise setting:

python scripts/eval_mnist.py

Test SPR on CIFAR10 with different noise setting:

python scripts/eval_cifar.py

Test SPR on Animal10:

python scripts/eval_animal.py

Test SPR on WebVision:

python scripts/eval_webvision.py

Acknowledgements

Thanks to everyone who makes their code and models available. In particular,

Contact Information

For issues using SPR, please submit a GitHub issue.

Citation

If you found the provided code useful, please cite our work.

@inproceedings{wang2022scalable,
  title={Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels},
  author={Wang, Yikai and Sun, Xinwei and Fu, Yanwei},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2022}
}