dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
77 stars 24 forks source link

Enhance contingency table statistics to include additional statistics #337

Closed dwfncar closed 10 years ago

dwfncar commented 10 years ago

Add several new scores and their standard error values to the contingency table statistics line type. Compare to values produced by Matt Pocernich's R code in the package "verification".


Once complete, add new JIRA issues to add to METViewer, handle appropriately in stat_analysis, and incorporate into regression testing.


Statistics include: add in output columns for stat, stat_ncl, stat_ncu, stat_bcl, stat_bcu


theta - Odds Ratio (already in CTS line)


log.theta - Log Odds Ratio


LOR.se - Standard Error for Log Odds Ratio (for use in computing normal confidence limits)


n.h - Degrees of freedom for log.theta (for computing confidence limits)


orss - Odds ratio skill score, aka Yules's Q


ORSS.se - Standard Error for Odds ratio skill score (CI's)


eds - Extreme Dependency Score


esd.se - Standard Error for EDS (CI's)


seds - Symmetric Extreme Dependency Score


seds.se - Standard Error for Symmetric Extreme Dependency Score (CI's)


EDI - Extreme Dependency Index


EDI.se - Standard Error for EDI


SEDI - Symmetric EDI


SEDI.se - Standard Error for SEDI


Bias-Corrected Gilbert Skill Score (not in Matt's verification package)


Matt's R code for these and other stats are included below:


table.stats
function (obs, pred = NULL, silent = FALSE)
{
    if (is.null(pred) & length(obs) == 4) {
        if (!silent) {
            print(" Assume data entered as c(n11, n01, n10, n00) ObsForecast")
        }
        a <- obs[1]
        b <- obs[2]
        c <- obs[3]
        d <- obs[4]
        tab.out <- matrix(c(a, c, b, d), nrow = 2)
    }
    if (is.null(pred) & is.matrix(obs) & prod(dim(obs)) == 4) {
        if (!silent) {
            print(" Assume contingency table has observed values in columns, forecasts in rows")
        }
        obs <- as.numeric(obs)
        a <- obs[1]
        b <- obs[3]
        c <- obs[2]
        d <- obs[4]
        tab.out <- matrix(c(a, c, b, d), nrow = 2)
    }
    if (!is.null(pred) & !is.null(obs)) {
        tab.out <- table(as.numeric(obs), as.numeric(pred))
        a <- tryCatch(tab.out["1", "1"], error = function(e) 0)
        b <- tryCatch(tab.out["0", "1"], error = function(e) 0)
        c <- tryCatch(tab.out["1", "0"], error = function(e) 0)
        d <- tryCatch(tab.out["0", "0"], error = function(e) 0)
    }
    n <- a + b + c + d
    s <- (a + c)/n
    TS <- a/(a + b + c)
    POD <- H <- a/(a + c)
    F <- b/(b + d)
    TS.se <- sqrt((TS^2)
((1 - H)/a + b (1 - F)/((a + b +
        c)^2)))
    SH2 <- H
(1 - H)/(a + c)
    SF2 <- F (1 - F)/(b + d)
    POD.se <- sqrt(SH2)
    F.se <- sqrt(SF2)
    M <- c/(a + c)
    FAR <- b/(a + b)
    FAR.se <- sqrt((FAR^4)
((1 - H)/a + (1 - F)/b) (a^2)/(b^2))
    HSS <- 2
(a d - b c)/(1 (a + c) (c + d) + 1 (a +
        b)
(b + d))
    SHSS2 <- SF2 (HSS^2) (1/(H - F) + (1 - s) (1 - 2
        s))^2 + SH2 (HSS^2) (1/(H - F) - s (1 - 2 s))^2
    HSS.se = sqrt(SHSS2)
    PSS <- 1 - M - F
    PSS.se <- sqrt(SH2 + SF2)
    KSS <- (a d - b c)/((a + c) (b + d))
    PC <- (a + d)/(a + b + c + d)
    PC.se <- sqrt(s
H (1 - H)/n + (1 - s) F (1 - F)/n)
    BIAS <- (a + b)/(a + c)
    OR <- a
d/(b c)
    ORSS <- (a
d - b c)/(a b + b c)
    HITSrandom <- 1
(a + c) (a + b)/(a + b + c + d)
    p <- (a + c)/n
    ETS <- (a - HITSrandom)/(a + b + c - HITSrandom)
    ETS.se <- sqrt(4
SHSS2/((2 - HSS)^4))
    theta <- (a d)/(b c)
    log.theta <- log(a) + log(d) - log(b) - log(c)
    n.h <- 1/(1/a + 1/b + 1/c + 1/d)
    yules.q <- (theta - 1)/(theta + 1)
    SLOR2 <- 1/n.h
    LOR.se <- sqrt(SLOR2)
    ORSS.se <- sqrt(SLOR2 4 OR^2/((OR + 1)^4))
    eds <- 2 log((a + c)/n)/log(a/n) - 1
    eds.se <- 2
abs(log(p))/(H (log(p) + log(H))^2) sqrt(H
        (1 - H)/(p
n))
    seds <- (log((a + b)/n) + log((a + c)/n))/log(a/n) - 1
    seds.se <- sqrt(H (1 - H)/(n p)) (-log(BIAS p^2)/(H
        log(H
p)^2))
    EDI <- (log(F) - log(H))/(log(F) + log(H))
    EDI.se <- 2 abs(log(F) + H/(1 - H) log(H))/(H (log(F) +
        log(H))^2)
sqrt(H (1 - H)/(p n))
    SEDI <- (log(F) - log(H) - log(1 - F) + log(1 - H))/(log(F) +
        log(H) + log(1 - F) + log(1 - H))
    SEDI.se <- 2 abs(((1 - H) (1 - F) + H F)/((1 - H)
        1 - F) log(F (1 - H)) + 2 H/(1 - H) log(H (1 -
        F)))/(H
(log(F (1 - H)) + log(H (1 - F)))^2)
        sqrt(H
(1 - H)/(p * n))
 
Also add in the bias-corrected ETS (BCETS) with bootstrap CI's. A paper describing the method is attached. Perhaps call it BCGSS since ETS is called GSS in the existing MET output. [MET-337] created by tressa

dwfncar commented 10 years ago

The following R code can be used to compute BCGSS:


BCGSS Reference:


Bias Adjusted Precipitation Threat Scores


F. Mesinger, Adv. Geosci., 16, 137-142, 2008


calcBCGSS = function(d){
if( 0 == d$total ){ return (NA); }
dblF = d$fy_oy + d$fy_on;
dblO = d$fy_oy + d$fn_oy;
dblLf = log(dblO / d$fn_oy);
dblHa = dblO - (d$fy_on / dblLf) lambert_W0(dblO / d$fy_on dblLf);
return( (dblHa - (dblO^2 / d$total)) / (2*dblO - dblHa - (dblO^2 / d$total)) );
} by johnhg

dwfncar commented 10 years ago

Committed a whole bunch of changes to the repository to add these statistics. However, I'm keeping the ticket open for now, because I want to check the computed values against what R produces one more time before resolving. by johnhg

dwfncar commented 10 years ago

Added several output columns to the end of the CTS line type: LODDS, ORSS, EDS, SEDS, EDI, SEDI, and BCGSS plus confidence intervals. Tested by comparing to the output of the R verification package. by johnhg