Paper Review: Confident learning: Estimating uncertainty in dataset labels

Publisher

JAIR (Journal of AI Research)

Link to The Paper

https://www.jair.org/index.php/jair/article/view/12125

Name of The Authors

Curtis Northcutt, Lu Jiang, and Isaac Chuang

Year of Publication

2021

Summary

The paper proposes a confident learning (CL) framework for estimating uncertainty in noisy labels and finding dataset errors. CL focuses on characterizing and cleaning label noise rather than modifying the model architecture. CL is based on three principles: pruning noisy examples, counting examples using probabilistic thresholds, and ranking examples during training. CL can estimate the distribution between noisy (observed) and true (latent) labels.

Contributions of The Paper

A proof showing sufficient conditions under which CL provably finds label errors and estimates the joint label distribution accurately.
Empirical results showing CL outperforms 7 other recent methods on CIFAR, finds label errors on MNIST, improves sentiment classification on Amazon reviews, and increases accuracy on ImageNet by cleaning the data first.

Comments

It's an interesting technique, already used for one of the baselines, though! It could be modified further with more architecture/task-specific/representation-specific subtleties for fault localization.

RAISEDAL / RAISEReadingList