Closed xrobin closed 5 years ago
Hi,
thanks for letting me know. Looking forward to the update to pROC
. And pull requests are of course welcome!
I can't recall if it was a memory or speed issue with pROC
. I just ran the benchmarks again and indeed pROC
is faster now. I don't know why that is - maybe an update to pROC
or to R. It's not as fast as in your benchmarks, but I assume you are already using the updated version.
With OptimalCutpoints
and 1e5 observations I still get Error: cannot allocate vector of size 37.2 Gb
. ThresholdROC
finishes but is very slow (several minutes).
You're right that we should let ROCR
calculate sensitivity and specificity in the second benchmark. Below is what the results now look like for me. I'll push the new benchmarks to Github and also add a session info. I was planning to update the benchmarks eventually to use the bench
package instead of microbenchmark
because it also records the total memory allocation.
I assume you are using pROC 1.14.0 from CRAN. It already has some improvements but the master branch on github is on par with ROCR now. You can try it out with devtools::install_github("xrobin/pROC")
.
I've made some minor changes in the way pROC is called, especially using the coords
function to find the best threshold. That one is very slow in 1.14.0. I'll update the data with the change in ROCR and send a pull request ASAP so you can see what's going on.
I see OptimalCutpoints
is trying to allocate a very large vector. Shouldn't be a problem with pROC then. I'd love to see a memory benchmark though, that would be very interesting!
Continuing discussion in #20
Dear Christian,
Thanks for the benchmarks that are performed in the vignette. I've been looking into why pROC is significantly slower, was able to identify and fix some of the bottlenecks. I'm planning to release it around the end of the month and will propose a pull request once it is on CRAN if that's OK with you.
I was wondering if speed was the only reason to exclude pROC with > than 1e5 observations, or if memory was also a factor. In the vignette you write:
Do you remember if memory was a criteria for pROC and if so what was your criteria exactly, or if it was only a run time reason? I have been able to run the benchmark with 1e7 data points in pROC without any noticable memory issue. Here is what it looks like:
PS: in the second plot, you run only the
ROCR::prediction
function, and notROCR::performance
which is necessary to get the sensitivities and specificities. Is there a reason for that?