evalclass / precrec

An R library for accurate and fast calculations of Precision-Recall and ROC curves
https://evalclass.github.io/precrec
GNU General Public License v3.0
45 stars 5 forks source link

Returning scores along with the curves? #5

Open lukauskas opened 6 years ago

lukauskas commented 6 years ago

Currently the PRC curve returns essentially a DataFrame with three columns: x, y, and a boolean column orig_points.

Is it possible to somehow map the non-interpolated points (orig_points = 1) to the actual score thresholds for the resulting precision/recall measurements? Somewhat among the lines of how sklearn handles it. This is sometimes needed to ask questions, like 'what is the minimum threshold at which precision is >= 75%?' or similar.

I assume a sorted increasing list of unique scores should map 1:1 to the orig_points, but this seems a bit hacky. Maybe there is a way to get it out of precrec directly?

takayasaito commented 6 years ago

It is not easy to use orig_points to retrieve the corresponding scores, but you can usemode = "basic" for that purpose. For instance, the following snippet shows how to get the original scores when precision is greater than or equal to 0.75.

library("precrec")

# Dataset with 10 positives and 10 negatives
data(P10N10)

# Calculate basic evaluation measures
sspoints <- evalmod(mode = "basic", scores = P10N10$scores, labels = P10N10$labels)

# Convert sspoints to data.frame
df <- data.frame(sspoints)

# Get normalized threshold values for precision >= 0.75
xs <- df[df$type == "precision" & df$y >= 0.75, "x"]

# Show scores and precision values corresponding to xs
df[df$x %in% xs & df$type %in% c("score", "precision"), ]

In the data frame of the example above, the x column contains the normalized threshold values with range [0, 1], and the y column contains the values specified in the type column.

Unlike ROC, precision-recall curves are not monotonically increasing so that you may need to add one more condition, such as 'recall is greater than 0.5', for some cases.