Lightning-AI / torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.
https://lightning.ai/docs/torchmetrics/
Apache License 2.0
2.13k stars 406 forks source link

Add Precision-Recall-Gain curve, Area Under Precision Recall Gain curve, and FGain1 score #1775

Open siemdejong opened 1 year ago

siemdejong commented 1 year ago

🚀 Feature

Add Precision-Recall-Gain (PRG) curve as a new feature with the same interface as the Precision-Recall (PR) curve.

Along with PRG, the Area Under the Precision Recall Gain curve (AUPRG) can be calculated, like is done AveragePrecision.

The FGain1 score (FG1) is the F1 score, but transformed such that it is the minor diagonal in PRG-space. This could be added.

Motivation

The PR curve has some caveats as described in [1]. PRG aims to fix these problems:

  1. baselines are non-universal
  2. interpolation is non-linear
  3. F-isometrics are non-linear
  4. Pareto-front is non-convex
  5. Area under PR curve does not relate to the expected F + there is an unachievable region

In particular, the area under the PR curve is demonstrated to sometimes favour models that result in lower F1-scores. The PRG curve will ultimately result in better model selection.

Pitch

A Torchmetrics implementation of the PRG curve that has the same interface as the PR curve would aid in better model selection.

>>> pred = torch.tensor([0, 0.1, 0.8, 0.4])
>>> target = torch.tensor([0, 1, 1, 0])
>>> prg_curve = PrecisionRecallGainCurve(task="binary")
>>> precision_gain, recall_gain, thresholds = prg_curve(pred, target)
>>> precision_gain
tensor([1.0000, 0.0000, 0.5000, 0.0000])
>>> recall_gain
tensor([0.0000,   0.0000,   1.0000,   1.0000])
>>> thresholds
...

Precision-Gain (PG) and Recall-Gain (RG) can be calculated as

$$ PG = 1 - \frac{tp + fn}{fp + tn} \cdot \frac{fp}{tp}, $$

and

$$ RG = 1 - \frac{tp + fn}{fp + tn} \cdot \frac{fn}{tp}. $$

AUPRG can be calculated as done with AveragePrecision, but only accounting for the area in PR & RG $\in [0, 1]$.

FG1 can be calculated as

$$ FG_1 = \frac{1}{2} PG + \frac{1}{2} RG. $$

It would be even more awesome if PRG can be extended to the multiclass/multilabel case.

Alternatives

The original authors of [1] have developed a package, pyprg (which is out-of-date with dependencies).

pip instal pyprg

Then,

from prg import prg
prg_curve = prg.create_prg_curve(labels=targets, scores=prediction)
precision_gain = prg_curve["precision_gain"]
recall_gain = prg_curve["recall_gain"]
auprg= prg.calc_auprg(prg_curve)

Additional context

[1] Flach & Kull. http://people.cs.bris.ac.uk/~flach/PRGcurves/PRcurves.pdf

github-actions[bot] commented 1 year ago

Hi! thanks for your contribution!, great first issue!

SkafteNicki commented 1 year ago

Hi @siemdejong, thanks for raising this issue. A couple of questions maybe:

siemdejong commented 1 year ago

For another writeup about gain metrics, see https://snorkel.ai/improving-upon-precision-recall-and-f1-with-gain-metrics/

Maybe an interesting discussion on scikit-learn and gain metrics: https://github.com/scikit-learn/scikit-learn/pull/24121

arijitde92 commented 1 year ago

Hi, can I contribute in this issue?

SkafteNicki commented 1 year ago

Hi @arijitde92, Feel free to make a contribution on this topic :)