davidavdav / ROCAnalysis.jl

Receiver Operating Characteristics and functions for evaluation probabilistic binary classifiers
Other
31 stars 10 forks source link

ROCAnalysis.jl

CI

Install

Pkg.add("ROCAnalysis")

Introduction

Receiver Operating Characteristic Analysis functions for evaluation probabilistic binary classifiers. Allows efficient plotting of ROC curves, and many more things.

Please note there is an alternative implementation under a similar name, and support for ROC analysis also exists in MLBase.

Our implementation is more geared towards:

The development roadmap is largely based on the functionality in a similar R package ROC.

Synopsis

An annotated jupyter notebook of the code below can be found here.

using ROCAnalysis
## Produce some well-calibrated log-likelihood-ratio scores for target and non-target class:
tar =  2 .+ 2randn(1000)
non = -2 .+ 2randn(100000)
## quick estimate of equal error rate, should be close to pnorm(-1) = 0.5 + 0.5erf(-1/√2)
eer(tar, non) 
## compute full ROC statistics
r = roc(tar, non)
## accurate computation of the equal error rate, using the convex hull
eerch(r)
## roc plot, we plot errors (false negatives against false positives) rather than hits vs. false alarms.
using Plots ## or perhaps another plotting package
plot(r)
## The "Detection Error Tradeoff" plot, this should give a more/less straight line
detplot(r)
## compute the Area Under the ROC, should be close to 0.078
auc(r)
## define a decision cost function by its parameter p_tar=0.01, Cfa=1, Cmiss=10 (NIST SRE 2008 setting)
d = DCF(0.01, 1, 10)
## `actual costs' using a threshold of scores at -plo(d) (minus prior log odds)
plo(d)
dcf(tar, non, d=d)
## Or use a different threshold, e.g., zero
dcf(tar, non, d=d, thres=0)
## `minimal costs' using an optimal threshold
mindcf(r, d=d)
## define an array of DCFs, and compute the decision costs for these, using a threshold at -plo
d = DCF([0.001, 0.01, 0.1, 0.5, 0.9, 0.99, 0.999], 1, 1)
dcf(tar, non, d=d)
## same, but normalized to costs of decisions based on the prior alone
dcf(tar, non, d=d, norm=true)
## prior log odds, thre crucial combination of cost parameters, log(p_tar / (1-p_tar) Cmiss / Cfa)
plo(d)
## now scan the Bayes error rate (similar to the dcf above) for a range of prior log odds, and plot
## This is known as the Applied Probability of Error plot
apeplot(r)
## The area under the red curve (actual error rates), the cost of the log-likelihood ratio
cllr(tar, non)
## The area under the green curve (minimum errors), the cost of the optimal log-likelihood-ratio
mincllr(tar, non)
## Similar to APE, but normalized---a Normalized Bayes Error plot
nbeplot(r)
## Make an `LLR' plot: score-to-optimal-LLR mapping, r.θ, vs. r.llr
llrplot(r)

Receiver Operating Characteristic

A binary classifier maps an input x to one of two classes, A and B. Internally, every classifier ends up producing some form of a scalar metric s, which can be thresholded to produce a decision.

There are two possible "senses" of this internal scalar:

There are of course also many different interpretations of the classes A and B. For instance, in biometrics B could mean "same individual" and A "different individual". The corresponding senses of s then have an interpretation

Because we want to focus in this package on a probabilistic interpretation of the scalar s, we take the "score-like" interpretation of s, i.e., higher values of s correspond to a higher likelihood of the class-of-interest to be associated to the input of the classifier. If your classifier is, in fact, a distance metric d, you could work with s = -d or s = 1/d or any other strictly decreasing function. Alternatively, you can swap around the label of the class of interest.

As the name suggests, a classifier is supposed to make decisions. Decisions can be thesholded against a fixed θ such that:

For evaluating the classifier, we need a set of supervised trials, i.e., for each scalar score s we need a label indicating the true class of the trial that led to score s. Because there are two classes, two types of errors can be made:

The Receiver Operating Characteristic (ROC) is a graph that shows how the fractions of the false positives and false negatives change with varying θ, for a fixed set of scores s. In ROCAnalysis, the structure of type Roc captures the essential information in a pre-processed way such that other quantities can be derived efficiently.

Because I come from automatic speaker recognition, I tend to use the following terminology for the classes

Error rates

In this package, we focus at analysing everything in terms of error rates. Traditionally, researchers have used ROC curves where one of the axes is used to describe the complement of an error rate. Specifically, one often sees a true positive rate versus a false positive rate, where the true positive rate is the complement of the false negative rate. There is no real objective way to judge whether one analysis is any better than the other, usually the choice is largely dependent on traditions in the area of the research you are in.

There are many different names of the error rates in different scientific disciplines. Because I come from the area of automatic speaker recognition, the current terminology is

We foresee that the naming of things becomes a bit more flexible in future releases of this package.

DET plot

A detection error trade-off plot (Martin et. al., 1997) is exactly the same as a ROC plot in the error domain (i.e., miss rate vs false alarm rate), but the axes have been warped according to Φ⁻¹(x), the inverse of the cumulative normal distribution. In R, this function is known as qnorm(x), in Julia base this is √2 * erfinv(2x - 1). This type of plot has interesting properties

Discrete and continuous scores

There is an essential difference between discrete score (classes) and continuous scores. For the former, trials with the same scores must be grouped before the probabilities of false alarm and miss are computed. This results in ROC and DET plots that can have line elements that are not solely horizontal or vertical. This is contrary to the latter case if we assume that no two scores are (coincidentally) the same, which leads to only horizontal and vertical line segments. This ROCAnalysis package makes sure that the occurrence of identical scores is treated correctly by sorting target scores before identical non-target scores, and by treating trials with scores at the threshold always as errors.

Plot optimisation

For larget trial sets, it is very likely that in the extremes of the score distributions there is very little overlap. This will results in many consecutive horizontal or vertical line segments in the plot. This ROCAnalysis package integrates these consecutive line segments and replaces them by a single segment, which leads to a strong reduction in complexity in further calculations and plotting.

Single-numbered metrics

The ROC and DET plots shows the discrimination capability of the detector as a graph. Often one wants to summarize the plot in a single metric. There are many ways to do this, we list some here

Types

We have defined the following types:

Notes

This is very much work in progress. If you stumble upon this, please drop me a line.