The paper proposes a confident learning (CL) framework for estimating uncertainty in noisy labels and finding dataset errors. CL focuses on characterizing and cleaning label noise rather than modifying the model architecture. CL is based on three principles: pruning noisy examples, counting examples using probabilistic thresholds, and ranking examples during training. CL can estimate the distribution between noisy (observed) and true (latent) labels.
Contributions of The Paper
A proof showing sufficient conditions under which CL provably finds label errors and estimates the joint label distribution accurately.
Empirical results showing CL outperforms 7 other recent methods on CIFAR, finds label errors on MNIST, improves sentiment classification on Amazon reviews, and increases accuracy on ImageNet by cleaning the data first.
Comments
It's an interesting technique, already used for one of the baselines, though!
It could be modified further with more architecture/task-specific/representation-specific subtleties for fault localization.
Publisher
JAIR (Journal of AI Research)
Link to The Paper
https://www.jair.org/index.php/jair/article/view/12125
Name of The Authors
Curtis Northcutt, Lu Jiang, and Isaac Chuang
Year of Publication
2021
Summary
The paper proposes a confident learning (CL) framework for estimating uncertainty in noisy labels and finding dataset errors. CL focuses on characterizing and cleaning label noise rather than modifying the model architecture. CL is based on three principles: pruning noisy examples, counting examples using probabilistic thresholds, and ranking examples during training. CL can estimate the distribution between noisy (observed) and true (latent) labels.
Contributions of The Paper
Comments
It's an interesting technique, already used for one of the baselines, though! It could be modified further with more architecture/task-specific/representation-specific subtleties for fault localization.