Add Precision-Recall-Gain curve, Area Under Precision Recall Gain curve, and FGain1 score

siemdejong commented 1 year ago

🚀 Feature

Add Precision-Recall-Gain (PRG) curve as a new feature with the same interface as the Precision-Recall (PR) curve.

Along with PRG, the Area Under the Precision Recall Gain curve (AUPRG) can be calculated, like is done AveragePrecision.

The FGain1 score (FG1) is the F1 score, but transformed such that it is the minor diagonal in PRG-space. This could be added.

Motivation

The PR curve has some caveats as described in [1]. PRG aims to fix these problems:

baselines are non-universal
interpolation is non-linear
F-isometrics are non-linear
Pareto-front is non-convex
Area under PR curve does not relate to the expected F + there is an unachievable region

In particular, the area under the PR curve is demonstrated to sometimes favour models that result in lower F1-scores. The PRG curve will ultimately result in better model selection.

Pitch

A Torchmetrics implementation of the PRG curve that has the same interface as the PR curve would aid in better model selection.

>>> pred = torch.tensor([0, 0.1, 0.8, 0.4])
>>> target = torch.tensor([0, 1, 1, 0])
>>> prg_curve = PrecisionRecallGainCurve(task="binary")
>>> precision_gain, recall_gain, thresholds = prg_curve(pred, target)
>>> precision_gain
tensor([1.0000, 0.0000, 0.5000, 0.0000])
>>> recall_gain
tensor([0.0000,   0.0000,   1.0000,   1.0000])
>>> thresholds
...

Precision-Gain (PG) and Recall-Gain (RG) can be calculated as

$$ PG = 1 - \frac{tp + fn}{fp + tn} \cdot \frac{fp}{tp}, $$

and

$$ RG = 1 - \frac{tp + fn}{fp + tn} \cdot \frac{fn}{tp}. $$

AUPRG can be calculated as done with AveragePrecision, but only accounting for the area in PR & RG $\in [0, 1]$.

FG1 can be calculated as

$$ FG_1 = \frac{1}{2} PG + \frac{1}{2} RG. $$

It would be even more awesome if PRG can be extended to the multiclass/multilabel case.

Alternatives

The original authors of [1] have developed a package, pyprg (which is out-of-date with dependencies).

pip instal pyprg

Then,

from prg import prg
prg_curve = prg.create_prg_curve(labels=targets, scores=prediction)
precision_gain = prg_curve["precision_gain"]
recall_gain = prg_curve["recall_gain"]
auprg= prg.calc_auprg(prg_curve)

Additional context

[1] Flach & Kull. http://people.cs.bris.ac.uk/~flach/PRGcurves/PRcurves.pdf

github-actions[bot] commented 1 year ago

Hi! thanks for your contribution!, great first issue!

SkafteNicki commented 1 year ago

Hi @siemdejong, thanks for raising this issue. A couple of questions maybe:

How commonly is this metric used? I have not heard or seen it before in any papers I read.
It is good that there is a package to compare against if we make our own implementation, but then I see issues like https://github.com/meeliskull/prg/issues/7 and wonder how stable the implementation is?

siemdejong commented 1 year ago

The metric is not (yet) commonly used. Obvious reasons might be that 1) people simply do not know about it, 2) it takes an extra step to calculate the plot, 3) no good implementation is available.
I have not tested the implementation thoroughly, so I cannot make arguments on the stability of the official implementation.

For another writeup about gain metrics, see https://snorkel.ai/improving-upon-precision-recall-and-f1-with-gain-metrics/

Maybe an interesting discussion on scikit-learn and gain metrics: https://github.com/scikit-learn/scikit-learn/pull/24121

arijitde92 commented 1 year ago

Hi, can I contribute in this issue?

SkafteNicki commented 1 year ago

Hi @arijitde92, Feel free to make a contribution on this topic :)

Lightning-AI / torchmetrics