Closed mehrankr closed 7 years ago
Thank you very much for your suggestions.
Using unique scores
It is important to treat tied scores properly to calculate accurate ROC and precision-recall curves. For instance, the evalmod
function provides the ties_method
option to decide how to treat them. Therefore, considering only unique scores is not a good approach to solve this slow plotting issue since the calculated curves will be inaccurate.
Changing resolution It is most likely an effective approach to make the plotting speed faster by trimming supporting points to a certain resolution. It is definitely feasible to enhance the package in this way, but I still need to look into it.
Alternative solution
It is often faster to make plots with plot
instead of ggplot
. I made a test script and tested on three different environments. In this test scenario, both plot
and ggplot
are similar when tested on OSX, but plot
is much faster than ggplot
on Linux and Windows.
# Test code
library(precrec)
library(ggplot2)
samp1 <- create_sim_samples(5, 50000, 50000)
eval1 <- evalmod(scores = samp1$scores, labels = samp1$labels)
system.time(autoplot(eval1))
system.time(plot(eval1))
# Linux - i7, 3.4GHz, 16 GB
> system.time(autoplot(eval1))
user system elapsed
8.169 0.079 8.489
> system.time(plot(eval1))
user system elapsed
0.681 0.015 0.699
# Windows - AMD A4, 1.8 GHz, 4 GB
> system.time(autoplot(eval1))
user system elapsed
31.09 6.94 45.56
> system.time(plot(eval1))
user system elapsed
11.94 6.96 19.48
# OSX - i5, 2.4 GHz, 4 GB
> system.time(autoplot(eval1))
user system elapsed
13.369 1.769 15.935
> system.time(plot(eval1))
user system elapsed
14.090 0.268 14.516
I updated autoplot
to reduce supporting points according to x_bins
of the evalmod
function. The points are reduced for ggplot2
by default.
I'll include this update in v0.7.0.
# Test code
library(precrec)
library(ggplot2)
samp1 <- create_sim_samples(5, 50000, 50000)
eval1 <- evalmod(scores = samp1$scores, labels = samp1$labels)
system.time(autoplot(eval1))
system.time(autoplot(eval1, reduce_points = FALSE))
# Linux - i7, 3.4GHz, 16 GB
> system.time(autoplot(eval1))
user system elapsed
0.594 0.000 0.626
> system.time(autoplot(eval1, reduce_points = FALSE))
user system elapsed
8.496 0.000 8.520
Thanks a lot for this very useful and well documented package.
I've noticed that the number of data points for either ROC or PRC is the same as input vectors. This slows plotting when datasets are large.
I think by default, evalmod should only consider unique values in "scores" argument. In addition, an option to decrease or increase the resolution might be helpful.
Thanks