Closed diegozea closed 6 years ago
Somewhat indirectly, right now.
i = ROCAnalysis.binsearch(-0.1, -r.pfa) ## find threshold where FAR = 0.1
traditionalAUC = 1 + dot(r.pmiss[i:end-1] + r.pmiss[i+1:end], diff(r.pfa[i:end])) / 2
This is the "traditional AUC" sense where AUC → 1
means good discrimination. I personally prefer to work with a sense where auc → 0
means good discrimination. The latter sense is consistent with the rest of the package, where the ROC is drawn with "miss rate" instead of "hit rate" on the vertical axis. In the package, all metrics are "error rate" like, and lower numbers indicate better performance.
Do you work with this kind of AUC a lot? What would be a good interface? something like:
AUC(r::roc, pfa=0.1) ## integrate over false positive rate 0--0.1
AUC(r::roc, pmiss=0.4) ## integrate over false negative rate. 0--0.4
That would be awesome to have! My group use a lot the classical AUC 0-0.1 and It's common to see this measure in bioinformatic publications.
Would be AUC(r::roc, pfa=0.1)
the classical AUC0.1 (where AUC -> 1
means good discrimination)?
Thanks!
Yes, that would be my proposal. I can't find any references of the definition of AUC0.1, but I assume it is the integral under the classic ROC for 0 < false positive rate < 0.1.
Yes, AUC0.1
(or AUC 0-0.1
) are notations for the integral under the classic ROC for 0
< FPR < 0.1
.
Hi @diegozea ,
Can you tell me what the ROC0.1 of an ideal discriminator is, is that 0.1
or 1.0
? Or maybe even 0.9
?
Sure, an ideal discriminator will have an AUC 0.1 of 0.1 * 1 == 0.1
. However, R's pROC
standardize the partial AUC to be 1.0 for an ideal discriminator:
If a pAUC is defined, it can be standardized (corrected). This correction is controled by the partial.auc.correct argument. If partial.auc.correct=TRUE, the correction by McClish will be applied: ( 1 + ( (auc−min) / (max−min) ) ) / 2 where auc is the uncorrected pAUC computed in the region defined by partial.auc, min is the value of the non-discriminant AUC (with an AUC of 0.5 or 50 in the region and max is the maximum possible AUC in the region. With this correction, the AUC will be 0.5 if non discriminant and 1.0 if maximal, whatever the region defined. Note that this correction is undefined for curves below the diagonal (auc < min). Attempting to correct such an AUC will return NA with a warning.
Thanks, the world would look a lot simpler if everybody would think in terms of errors and minimizing them. But there is no way a whole field like bioinformatics would change as a whole... Normalizing between 0 and 0.5 is somewhat easier than normalizing between 0.5 and 1.0...
so, partialauccorrect=true
will do this normalization, then?
I prefer something more simple, like correct=true
.
This would suggest correct=false
gives an incorrect answer. I would call this a normalization. But since this is about pROC compatibility, we could use their terminology. Or make partialauccorrect=true
the default.
I vote for normalize
, and use true
as default. We don't need to be compatible with pROC, because their auc
function takes many partial.auc...
keyword arguments after a general partial.auc
argument. This is the help of pROC's auc
function:
auc package:pROC R Documentation
Compute the area under the ROC curve
Description:
This function computes the numeric value of area under the ROC
curve (AUC) with the trapezoidal rule. Two syntaxes are possible:
one object of class “roc”, or either two vectors (response,
predictor) or a formula (response~predictor) as in the ‘roc’
function. By default, the total AUC is computed, but a portion of
the ROC curve can be specified with ‘partial.auc’.
Usage:
auc(...)
## S3 method for class 'roc'
auc(roc, partial.auc=FALSE, partial.auc.focus=c("specificity",
"sensitivity"), partial.auc.correct=FALSE,
allow.invalid.partial.auc.correct = FALSE, ...)
## S3 method for class 'smooth.roc'
auc(smooth.roc, ...)
## S3 method for class 'multiclass.roc'
auc(multiclass.roc, ...)
## S3 method for class 'formula'
auc(formula, data, ...)
## Default S3 method:
auc(response, predictor, ...)
Arguments:
roc, smooth.roc, multiclass.roc: a “roc” object from the ‘roc’
function, a “smooth.roc” object from the ‘smooth’ function,
or a “multiclass.roc” from the ‘multiclass.roc’ function.
response, predictor: arguments for the ‘roc’ function.
formula, data: a formula (and possibly a data object) of type
response~predictor for the ‘roc’ function.
partial.auc: either ‘FALSE’ (default: consider total area) or a numeric
vector of length 2: boundaries of the AUC to consider in
[0,1] (or [0,100] if percent is ‘TRUE’).
partial.auc.focus: if ‘partial.auc’ is not ‘FALSE’ and a partial AUC is
computed, specifies if ‘partial.auc’ specifies the bounds in
terms of specificity (default) or sensitivity. Can be
shortened to spec/sens or even sp/se. Ignored if
‘partial.auc=FALSE’.
partial.auc.correct: logical indicating if the correction of AUC must
be applied in order to have a maximal AUC of 1.0 and a
non-discriminant AUC of 0.5 whatever the ‘partial.auc’
defined. Ignored if ‘partial.auc=FALSE’. Default: ‘FALSE’.
allow.invalid.partial.auc.correct: logical indicating if the correction
must return ‘NA’ (with a ‘warning’) when attempting to
correct a pAUC below the diagonal. Set to ‘TRUE’ to return a
(probably invalid) corrected AUC. This is useful especially
to avoid introducing a bias against low pAUCs in bootstrap
operations.
...: further arguments passed to or from other methods, especially
arguments for ‘roc’ when calling ‘auc.default’,
‘auc.formula’, ‘auc.smooth.roc’. Note that the ‘auc’
argument of ‘roc’ is not allowed. Unused in ‘auc.roc’.
Details:
This function is typically called from ‘roc’ when ‘auc=TRUE’
(default). It is also used by ‘ci’. When it is called with two
vectors (response, predictor) or a formula (response~predictor)
arguments, the ‘roc’ function is called and only the AUC is
returned.
By default the total area under the curve is computed, but a
partial AUC (pAUC) can be specified with the ‘partial.auc’
argument. It specifies the bounds of specificity or sensitivity
(depending on ‘partial.auc.focus’) between which the AUC will be
computed. As it specifies specificities or sensitivities, you must
adapt it in relation to the 'percent' specification (see details
in ‘roc’).
‘partial.auc.focus’ is ignored if ‘partial.auc=FALSE’ (default).
If a partial AUC is computed, ‘partial.auc.focus’ specifies if the
bounds specified in ‘partial.auc’ must be interpreted as
sensitivity or specificity. Any other value will produce an error.
It is recommended to ‘plot’ the ROC curve with ‘auc.polygon=TRUE’
in order to make sure the specification is correct.
If a pAUC is defined, it can be standardized (corrected). This
correction is controled by the ‘partial.auc.correct’ argument. If
‘partial.auc.correct=TRUE’, the correction by McClish will be
applied:
(1+(auc-min)/(max-min))/2
where auc is the uncorrected pAUC computed in the region defined
by ‘partial.auc’, min is the value of the non-discriminant AUC
(with an AUC of 0.5 or 50 in the region and max is the maximum
possible AUC in the region. With this correction, the AUC will be
0.5 if non discriminant and 1.0 if maximal, whatever the region
defined. This correction is fully compatible with ‘percent’.
Note that this correction is undefined for curves below the
diagonal (auc < min). Attempting to correct such an AUC will
return ‘NA’ with a warning.
Value:
The numeric AUC value, of class ‘c("auc", "numeric")’ (or
‘c("multiclass.auc", "numeric")’ if a “multiclass.roc” was
supplied), in fraction of the area or in percent if
‘percent=TRUE’, with the following attributes:
partial.auc: if the AUC is full (FALSE) or partial (and in this case
the bounds), as defined in argument.
partial.auc.focus: only for a partial AUC, if the bound specifies the
sensitivity or specificity, as defined in argument.
partial.auc.correct: only for a partial AUC, was it corrected? As
defined in argument.
percent: whether the AUC is given in percent or fraction.
roc: the original ROC curve, as a “roc”, “smooth.roc” or
“multiclass.roc” object.
Smoothed ROC curves:
There is no difference in the computation of the area under a
smoothed ROC curve, except for curves smoothed with
‘method="binomial"’. In this case and only if a full AUC is
requested, the classical binormal AUC formula is applied:
pnorm(a/sqrt(1+b^2).
If the ROC curve is smoothed with any other ‘method’ or if a
partial AUC is requested, the empirical AUC described in the
previous section is applied.
Multi-class AUCs:
With an object of class “multiclass.roc”, a multi-class AUC is
computed as an average AUC as defined by Hand and Till (equation
7).
2/(count * (count - 1))*sum(aucs)
with aucs all the pairwise roc curves.
References:
Tom Fawcett (2006) ``An introduction to ROC analysis''. _Pattern
Recognition Letters_ *27*, 861-874. DOI:
10.1016/j.patrec.2005.10.010.
David J. Hand and Robert J. Till (2001). A Simple Generalisation
of the Area Under the ROC Curve for Multiple Class Classification
Problems. _Machine Learning_ *45*(2), p. 171-186. DOI:
10.1023/A:1010920819831.
Donna Katzman McClish (1989) ``Analyzing a Portion of the ROC
Curve''. _Medical Decision Making_ *9*(3), 190-195. DOI:
10.1177/0272989X8900900307.
Xavier Robin, Natacha Turck, Alexandre Hainard, _et al._ (2011)
``pROC: an open-source package for R and S+ to analyze and compare
ROC curves''. _BMC Bioinformatics_, *7*, 77. DOI:
10.1186/1471-2105-12-77.
See Also:
‘roc’, ‘ci.auc’
Examples:
data(aSAH)
# Syntax (response, predictor):
auc(aSAH$outcome, aSAH$s100b)
# With a roc object:
rocobj <- roc(aSAH$outcome, aSAH$s100b)
# Full AUC:
auc(rocobj)
# Partial AUC:
auc(rocobj, partial.auc=c(1, .8), partial.auc.focus="se", partial.auc.correct=TRUE)
# Alternatively, you can get the AUC directly from roc():
roc(aSAH$outcome, aSAH$s100b)$auc
roc(aSAH$outcome, aSAH$s100b,
partial.auc=c(1, .8), partial.auc.focus="se",
partial.auc.correct=TRUE)$auc
Hi,
Could you have a look at the current implementation? I think I've got the normalized AUC with limits right, the un-normalized AUC was a little harder.
AUC(tar, non, pfa=0.1)
AUC(roc(tar, non), pfa=0.1)
AUC(tar, non, pfa=0.1, normalize=false)
I tested it using the example from the R package ROCR, and I get a negative result for pfa=0.01
. I don't know where is the error.
julia> using DataFrames, ROCAnalysis
julia> download("https://raw.githubusercontent.com/diegozea/ROC.jl/master/test/ROCRdata.csv", "ROCRdata.csv");
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4018 100 4018 0 0 5350 0 --:--:-- --:--:-- --:--:-- 5350
julia> rocr = readtable("ROCRdata.csv");
julia> R = roc(convert(Vector{Float64}, rocr[rocr[:labels] .== 1, :predictions]), convert(Vector{Float64},rocr[rocr[:labels] .== 0, :predictions]));
julia> AUC(R) # ROCR 0.8341875
0.8341875188423273
julia> AUC(R, pfa=0.1, normalize=false) # ROCR 0.02780625
0.026540046226509892
julia> AUC(R, pfa=0.01, normalize=false) # ROCR 0.0003296151
-0.01693196663651895
Hello,
OK, my AUC integration forgot about line segments not coinciding with the upper bound. (I usually work with lots of points, then the error would be smaller). The current implementation should still be slightly off when you have diagonal ROC segments (caused by identical scores for target and non-target), and this diagonal segment is cut off by the integration upper bound. That is for another day.
rocr[:target] = rocr[:labels] .== 1
r = roc(rocr, score=:predictions)
I get the same values as you quote for the ROCR package. But we probably need more testing.
I force-pushed a new version (previous was incomplete), this should handle the diagonal-line-segment ROC curves properly.
Yes, ROCR examples are working fine in the master branch:
julia> AUC(R) # ROCR 0.8341875
0.8341875188423273
julia> AUC(R, pfa=0.1, normalize=false) # ROCR 0.02780625
0.02780625062807758
julia> AUC(R, pfa=0.01, normalize=false) # ROCR 0.0003296151
0.000329615114058884
OK, I'll close this issue since I believe it has been taken care of.
Hi! How can I calculate AUC until FPR (False Positive Rate) 0.1 using this package? Best,