evalclass / precrec

An R library for accurate and fast calculations of Precision-Recall and ROC curves
https://evalclass.github.io/precrec
GNU General Public License v3.0
45 stars 5 forks source link

Calculating AUC/AUPRC confidence intervals #13

Open micdonato opened 4 years ago

micdonato commented 4 years ago

Hello.

I love precrec, but every time I use it I have to go crazy with integrating it with pROC to include confidence intervals of the AUCs (I still wasn't able to do so for AUPRCs).

Since precrec computes the cb bounds for the curves, is it possible to have the confidence intervals coming out of the auc function?

takayasaito commented 4 years ago

I checked the source code of pROC for its CI calculation and found that it uses a bootstrapping approach. pROC generates 2000 bootstrap samples (resampling with replacement) by default so that 2000 AUCs should be calculated. Then, it simply selects the 0.25 and the 0.975 quantiles from the list of the calculated AUCs when the significant level (alpha) is 0.05.

Since precrec doesn't provide bootstrapping, we can't apply the same method to calculate CIs. Alternatively, you can still use precrec to calculate a CI when you are dealing with cross-validation data. I added a simple help function called auc_ci that performs CI calculation on precrec objects.

library(precrec)

# Create sample datasets with 100 positives and 100 negatives
samps <- create_sim_samples(4, 100, 100, "all")
mdat <- mmdata(samps[["scores"]], samps[["labels"]],
               modnames = samps[["modnames"]],
               dsids = samps[["dsids"]])

# Generate an mscurve object that contains ROC and Precision-Recall curves
mmcurves <- evalmod(mdat)

# Calculate CI of AUCs
auc_ci(mmcurves)

# Calculate CI with alpha = 0.01
auc_ci(mmcurves, alpha = 0.01)

# Calculate CI with t-distribution
auc_ci(mmcurves, dtype = "t")

I have submitted precrec v0.11 to CRAN, and it has been already available for several platforms. You can check the availability status here.

JanaFe commented 3 years ago

Hi, I am also trying to calculate confidence intervals for the area under the precision recall curve with R version 4.0.3.

I have a vector of scores (value range 0-100), and a vector of labels (0 or 1). Running this code:

mdat <- mmdata(scores, labels)  
mmcurves <- evalmod(mdat)  
mm_auc_ci <- auc_ci(mmcurves, alpha=0.05, dtype='t')  

Gives an error: Error: 'curves' must contain multiple datasets.

What am I doing wrong?

takayasaito commented 3 years ago

precrec doesn't calculate confidence band/confidence interval for a single testset but for cross-validation results with multiple testsets. Your example seems like a case of a single test set to me. It is of course possible to use a bootstrapping approach to simulate the result of your model with a single test set, but I don't know whether or not it's a good idea.

  1. Your example
    
    library(precrec)

Create scores and labels

n <- 100 scores <- runif(n)*100 labels <- sample(c(0, 1), n, replace=TRUE)

Calculate curves (single model with single dataset)

mdat <- mmdata(scores, labels) sscurves <- evalmod(mdat) plot(sscurves)


2. Resample scores `r1` times
```R
# Create bootstrapped scores
r1 <- 10
resampled_scores <- replicate(r1, sample(scores, replace=TRUE))

# Calculate curves (single model with multiple datasets)
smdat1 <- mmdata(resampled_scores, labels, modnames=rep("m1", r1), dsids=1:r1)
smcurves1 <- evalmod(smdat1)
plot(smcurves1)
auc_ci(smcurves1)  
  1. Resample labels r2 times
    
    # Create bootstrapped labels
    r2 <- 10
    resampled_labels <- replicate(r2, sample(labels, replace=TRUE))

Calculate curves (single model with multiple datasets)

smdat2 <- mmdata(replicate(r2, scores), resampled_labels, modnames=rep("m1", r2), dsids=1:r2) smcurves2 <- evalmod(smdat2) plot(smcurves2) auc_ci(smcurves2)



To access the performance of your model accurately, it would be much better to perform cross-validation than bootstrapping the result of your model on a test dataset (resampling scores and labels like the examples above). I would avoid using any bootstrapping approaches if it's possible. 
JanaFe commented 3 years ago

That helps, thanks a lot!

bblodfon commented 4 months ago

Hi @takayasaito! Happy to have found your package! I am trying to do something similar to the above (ie we have predictions from a single model and we do stratified bootstrap both on labels and scores to see the variability of the PR) and would like a bit your help since you know the internal functions better than me :)

So, how can I get the Precision-Recall data in a data.frame from an smcurves object (before plotting)? eg

library(precrec)

samps = create_sim_samples(4, 100, 100, "good_er")
mdat  = mmdata(samps[["scores"]], samps[["labels"]],
  modnames = samps[["modnames"]],
  dsids = samps[["dsids"]]
)
smcurves = evalmod(mdat, type = "rocpr")

# how can I get a `data.frame` with colnames `c(recall, precision, threshold)` for each dataset ID?
# ie a list of `data.frame`s with that info? My problem especially using `PRROC` doing the same 
# thing is that the multiplicity and number of thresholds is different so merging them is really a 
# pain :) - which I think you have solved since we can call `plot(smcurves)`!
smcurves
#> 
#>     === AUCs ===
#> 
#>      Model name Dataset ID Curve type       AUC
#>    1    good_er          1        ROC 0.8364000
#>    2    good_er          1        PRC 0.8593735
#>    3    good_er          2        ROC 0.7677000
#>    4    good_er          2        PRC 0.8169513
#>    5    good_er          3        ROC 0.8218000
#>    6    good_er          3        PRC 0.8520650
#>    7    good_er          4        ROC 0.8139000
#>    8    good_er          4        PRC 0.8528955
#> 
#> 
#>     === Input data ===
#> 
#>      Model name Dataset ID # of negatives # of positives
#>    1    good_er          1            100            100
#>    2    good_er          2            100            100
#>    3    good_er          3            100            100
#>    4    good_er          4            100            100

Created on 2024-04-26 with reprex v2.0.2

bblodfon commented 4 months ago

Ah, ok you have it in res = precrec::evalmod(data, raw_curves = TRUE), can extract it, nice

bblodfon commented 4 months ago

So, the thresholds might not be equal as far as I can see (I thought x_bins controls for that), may it's a bug? I have another example where there are way more unique values. Maybe filling them up with the last precision value in each respective vector makes sense? (without breaking the 1-1 correspondence between the thresholds I guess, if that makes sense...)

library(precrec)

samps = create_sim_samples(100, 20, 20, "good_er")
mdat  = mmdata(samps[["scores"]], samps[["labels"]],
  modnames = samps[["modnames"]],
  dsids = samps[["dsids"]]
)

# Generate an smcurve object that contains ROC and Precision-Recall curves
smcurves = evalmod(mdat, type = "rocpr", raw_curves = TRUE)
# extract precision vectors per dataset
precision = lapply(smcurves$prcs, function(obj) obj$y)
unique(unlist(lapply(precision, length)))
#> [1] 1024 1023

Created on 2024-04-26 with reprex v2.0.2

takayasaito commented 3 months ago

For the first example, you can simply call data.frame as data.frame(smcurves).

data.frame(smcurves) |> head()
#      x      y      ymin      ymax modname type
#1 0.000 0.0000 0.0000000 0.0000000 good_er  ROC
#2 0.000 0.2975 0.1912348 0.4037652 good_er  ROC
#3 0.001 0.2975 0.1912348 0.4037652 good_er  ROC
#4 0.002 0.2975 0.1912348 0.4037652 good_er  ROC
#5 0.003 0.2975 0.1912348 0.4037652 good_er  ROC
#6 0.004 0.2975 0.1912348 0.4037652 good_er  ROC

Similarly, you can use data.frame to convert an AUC object to a data.frame.

auc(smcurves) |> data.frame() |> head()
#  modnames dsids curvetypes      aucs
#1  good_er     1        ROC 0.7683000
#2  good_er     1        PRC 0.8108477
#3  good_er     2        ROC 0.8287000
#4  good_er     2        PRC 0.8626605
#5  good_er     3        ROC 0.7498000
#6  good_er     3        PRC 0.7995740
takayasaito commented 3 months ago

For the second question, you can convert the object to a data frame in order to check the actual values.

library(dplyr)

precision <- data.frame(smcurves) |> 
  dplyr::filter(type == "ROC" & modname == "good_er" & dsid == 1) |>
  dplyr::select(x)

length(precision) == length(unique(precision))
# [1] TRUE