Closed mostafiz67 closed 3 years ago
You need scores and labels to use precrec
so that the library can prevent incorrect ROC and PRC calculations. You can simply create ROC curves and calculate AUC scores if you have already calculated TPRs and FPRs yourself. For example, you can use the ggplot2
package to create ROC plots and rollmean
from the zoo
package to calculate AUCs.
Although it is quite different from how precrec
calculates curves and AUCs, I hope you can get the basic idea from it.
library(tibble)
library(dplyr)
library(ggplot2)
library(zoo)
# I copied your data frame to "df" first.
# df <- tibble(SP_length = c(2L, ....
# Add start points and two new columns (model & line ID)
# N.B. All ROC curves must include the origin (0, 0).
df <- bind_rows(df,
tibble(SP_length = c(2L, 3L, 2L, 3L, 2L, 3L),
Test_dataset = c(1L, 1L, 2L, 2L, 3L, 3L),
Prediction_Threshold = c(0, 0, 0, 0, 0, 0),
TPR_All = c(0, 0, 0, 0, 0, 0),
FPR_All = c(0, 0, 0, 0, 0, 0))) %>%
arrange(Test_dataset, SP_length, desc(TPR_All), desc(FPR_All)) %>%
mutate(model = factor(SP_length),
line_id = factor(SP_length * 10 + Test_dataset))
# ggplot
p1 <- ggplot(df, aes(x = FPR_All, y = TPR_All,
group = line_id, color = model)) +
geom_line()
print(p1)
# ggplot with three grid cells
p2 <- ggplot(df, aes(x = FPR_All, y = TPR_All,
group = line_id, color = model)) +
geom_line() +
facet_grid(cols = vars(Test_dataset))
print(p2)
# AUC - calculate multiple trapezium areas
aucs <- df %>%
arrange(Test_dataset, SP_length, TPR_All, FPR_All) %>%
group_by(model, Test_dataset) %>%
summarise(auc = sum(diff(FPR_All) * rollmean(TPR_All, 2))) %>%
ungroup() %>%
arrange(Test_dataset, model)
print(aucs)
@takayasaito Thank you very much for your kind response and suggestions. However, I was using the below code to draw the AUC
.
ggplot(df, mapping = aes(x = FPR_All, y = TPR_All, color = method)) +
geom_line(show.legend = FALSE) +
facet_grid(method ~ Test_dataset,
labeller = labeller(Test_dataset = function(x)paste0("Test Dataset ",x),
method = function(x)paste0("Method ",x))) +
ggtitle("AUC Curve for Neighbor Based (Dataset 1: Disjoint)") +
theme(plot.title = element_text(hjust = 0.5))
Now, is there any possible way to show the percentage of the area under the curve
in my figure? I just want to show the % AUC
in my plot. Something like this.
You can simply use geom_text
or geom_label
with an additional data frame.
# AUC
aucs <- df %>%
arrange(Test_dataset, SP_length, TPR_All, FPR_All) %>%
group_by(SP_length, Test_dataset) %>%
summarise(auc = sum(diff(FPR_All) * rollmean(TPR_All, 2))) %>%
ungroup() %>%
arrange(Test_dataset, SP_length) %>%
mutate(x = 0.7,
y = 0.25,
label = paste("AUC:", round(auc, 2))) %>%
rename(method = SP_length)
print(aucs)
# ggplot with 6 grid cells
p3 <- ggplot(df %>% rename(method = SP_length),
mapping = aes(x = FPR_All, y = TPR_All, color = method)) +
geom_line(show.legend = FALSE) +
geom_text(data = aucs, aes(x = x, y = y, label = label), color = "black") +
facet_grid(
method ~ Test_dataset,
labeller = labeller(
Test_dataset = function(x)
paste0("Test Dataset ", x),
method = function(x)
paste0("Method ", x)
)
) +
ggtitle("AUC Curve for Neighbor Based (Dataset 1: Disjoint)") +
theme(plot.title = element_text(hjust = 0.5))
print(p3)
Thank you very much.
@takayasaito I am extremely sorry to bother you again. But, I have another dataset and this dataset does not contain any Sp_length
. So, I changed the SP_length
with method
. I think I am getting wrong (because of changing the code).
Code:
aucs <- df %>%
arrange(Test_dataset, method, TPR_All, FPR_All) %>%
group_by(method, Test_dataset) %>%
summarise(auc = sum(diff(FPR_All) * rollmean(TPR_All, 2))) %>%
ungroup() %>%
arrange(Test_dataset, method) %>%
mutate(x = 0.7,
y = 0.25,
label = paste("AUC:", round(auc, 2))) %>%
rename(method_1 = method)
print(aucs)
The output I am getting
method_1 Test_dataset auc x y label
<chr> <int> <dbl> <dbl> <dbl> <chr>
1 AA 1 0.00582 0.7 0.25 AUC: 0.01
2 CN 1 0.0108 0.7 0.25 AUC: 0.01
3 Dice 1 0.0293 0.7 0.25 AUC: 0.03
4 JAC 1 0.0241 0.7 0.25 AUC: 0.02
5 L3 1 0.000610 0.7 0.25 AUC: 0
6 RA 1 0.000140 0.7 0.25 AUC: 0
7 AA 2 0.00960 0.7 0.25 AUC: 0.01
8 CN 2 0.0104 0.7 0.25 AUC: 0.01
9 Dice 2 0.0287 0.7 0.25 AUC: 0.03
10 JAC 2 0.0242 0.7 0.25 AUC: 0.02
But my actual AUC
is showing that I should get mode AUC value
for Dice and JAC
methods. The figure link.
Sample Data
structure(list(method = c("CN", "CN", "CN", "CN", "CN", "CN",
"CN", "CN", "CN", "CN", "AA", "AA", "AA", "AA", "AA", "AA", "AA",
"AA", "AA", "AA", "JAC", "JAC", "JAC", "JAC", "JAC", "JAC", "JAC",
"JAC", "JAC", "JAC", "L3", "L3", "L3", "L3", "L3", "L3", "L3",
"L3", "L3", "L3", "Dice", "Dice", "Dice", "Dice", "Dice", "Dice",
"Dice", "Dice", "Dice", "Dice", "RA", "RA", "RA", "RA", "RA",
"RA", "RA", "RA", "RA", "RA", "CN", "CN", "CN", "CN", "CN", "CN",
"CN", "CN", "CN", "CN", "AA", "AA", "AA", "AA", "AA", "AA", "AA",
"AA", "AA", "AA", "JAC", "JAC", "JAC", "JAC", "JAC", "JAC", "JAC",
"JAC", "JAC", "JAC", "L3", "L3", "L3", "L3", "L3", "L3", "L3",
"L3", "L3", "L3", "Dice", "Dice", "Dice", "Dice", "Dice", "Dice",
"Dice", "Dice", "Dice", "Dice", "RA", "RA", "RA", "RA", "RA",
"RA", "RA", "RA", "RA", "RA"), Prediction_Threshold = c(0.1,
0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1,
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3,
0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,
1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2,
0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9, 1), Test_dataset = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
TPR_All = c(0.878103837471783, 0.77765237020316, 0.669300225733634,
0.625282167042889, 0.58803611738149, 0.576749435665914, 0.573363431151242,
0.423250564334086, 0.0756207674943567, 0.00112866817155756,
0.866817155756208, 0.712189616252822, 0.628668171557562,
0.589164785553047, 0.553047404063205, 0.108352144469526,
0.00225733634311512, 0.00225733634311512, 0.00225733634311512,
0.00112866817155756, 0.957110609480813, 0.920993227990971,
0.851015801354402, 0.785553047404063, 0.715575620767494,
0.644469525959368, 0.534988713318284, 0.13431151241535, 0.0090293453724605,
0.00790067720090293, 0.302483069977427, 0.0733634311512415,
0.0372460496613995, 0.0293453724604966, 0.0191873589164786,
0.00790067720090293, 0.00564334085778781, 0.00564334085778781,
0.00112866817155756, 0, 0.978555304740406, 0.948081264108352,
0.930022573363431, 0.891647855530474, 0.836343115124153,
0.760722347629797, 0.68510158013544, 0.591422121896163, 0.072234762979684,
0.00790067720090293, 0.734762979683973, 0.0248306997742664,
0.00790067720090293, 0.00564334085778781, 0.00451467268623025,
0.00225733634311512, 0.00225733634311512, 0.00225733634311512,
0.00225733634311512, 0.00112866817155756, 0.889887640449438,
0.775280898876405, 0.687640449438202, 0.61685393258427, 0.57752808988764,
0.560674157303371, 0.556179775280899, 0.546067415730337,
0.18314606741573, 0.00224719101123596, 0.90561797752809,
0.80561797752809, 0.68876404494382, 0.624719101123596, 0.585393258426966,
0.569662921348315, 0.543820224719101, 0.132584269662921,
0.00112359550561798, 0.00112359550561798, 0.966292134831461,
0.931460674157303, 0.865168539325843, 0.798876404494382,
0.719101123595506, 0.637078651685393, 0.543820224719101,
0.133707865168539, 0.00561797752808989, 0.00561797752808989,
0.331460674157303, 0.0707865168539326, 0.0438202247191011,
0.0292134831460674, 0.0146067415730337, 0.00786516853932584,
0.00337078651685393, 0.00112359550561798, 0, 0, 0.979775280898876,
0.961797752808989, 0.935955056179775, 0.902247191011236,
0.857303370786517, 0.773033707865169, 0.680898876404494,
0.59438202247191, 0.0584269662921348, 0.00561797752808989,
0.9, 0.719101123595506, 0.0617977528089888, 0.0247191011235955,
0.0112359550561798, 0.00561797752808989, 0.00337078651685393,
0.00224719101123596, 0.00112359550561798, 0.00112359550561798
), FPR_All = c(0.0133403448562177, 0.00259241832959693, 0.000156836696096611,
0, 0, 0, 0, 0, 0, 0, 0.00743590453258052, 0.000424381648261419,
9.22568800568302e-06, 0, 0, 0, 0, 0, 0, 0, 0.0288395007057651,
0.0202227081084572, 0.0127037723838255, 0.00748203297260893,
0.00341350456210272, 0.00122701650475584, 0.000424381648261419,
0.000258319264159125, 0.000202965136125027, 0.000175288072107977,
0.00414233391455168, 0.0010886311846706, 0.000765732104471691,
0.000405930272250053, 0.000221416512136393, 0.000138385320085245,
7.38055040454642e-05, 2.76770640170491e-05, 1.8451376011366e-05,
9.22568800568302e-06, 0.0342457538770954, 0.0278154493371343,
0.0216434640613324, 0.0165139815301726, 0.0113199191829731,
0.00575682931554621, 0.00204810273726163, 0.000636572472392129,
0.000221416512136393, 0.000175288072107977, 0.000369027520227321,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0128430762010487, 0.00236464228016498,
0.000198557748716144, 0, 0, 0, 0, 0, 0, 0, 0.0115163494255363,
0.00228341411023565, 0.000117329578786812, 0, 0, 0, 0, 0,
0, 0, 0.0285471890540528, 0.0197384452928276, 0.0122654536593291,
0.00708490148828058, 0.00357403947689059, 0.00122744790115434,
0.000469318315147249, 0.000261735214216735, 0.000171481692073033,
0.000153430987644293, 0.00391700286103665, 0.00106499156129568,
0.000631774655005912, 0.000297836623074215, 0.000180507044287403,
0.000108304226572442, 6.31774655005912e-05, 9.02535221437017e-06,
9.02535221437017e-06, 9.02535221437017e-06, 0.0332223215010966,
0.027355842561756, 0.0212908058736992, 0.0160831776460076,
0.0107672451917436, 0.00572207330391069, 0.00208485636151951,
0.000667876063863392, 0.000198557748716144, 0.000153430987644293,
0.0040523831442522, 0.000144405635429923, 9.02535221437017e-06,
0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 120L), class = "data.frame")
I don't know how you have calculated your TPRs and FPRs, but they are insufficient. You need to use one more lower bound and one more upper bound threshold values. Alternatively, you can manually add (FPR, TPR) = ((0, 0), (1, 1)) to your data frame in R. If you need to calculate accurate model performance metrics, it is easier to use some library, such as precrec
.
Hope this helps.
When I am reading the manual of the package, I am not getting any idea, how should I use the package for my dataset.
I have two models namely
2, and 3
. I have10 test datasets
. I have applied different thresholds for each model and each dataset (8 thresholds for each test dataset). I also calculated thetrue positive rate
,false-positive rate
, etc for each test dataset.Now, is it possible to draw
AUC
,ROC
from my result dataset and using this package? Does this package specifically needscores
andlabels
to draw the curves?Sample Dataset: Here, Model:
SP_length
, Test Dataset Number:Test_dataset
, Threshold:Prediction_Threshold
, True Positive Rate:TRP_All
, and False Positive Rate:FPR_All
.Thank you.