dwfncar commented 10 years ago

Add several new scores and their standard error values to the contingency table statistics line type. Compare to values produced by Matt Pocernich's R code in the package "verification".

Once complete, add new JIRA issues to add to METViewer, handle appropriately in stat_analysis, and incorporate into regression testing.

Statistics include: add in output columns for stat, stat_ncl, stat_ncu, stat_bcl, stat_bcu

theta - Odds Ratio (already in CTS line)

log.theta - Log Odds Ratio

LOR.se - Standard Error for Log Odds Ratio (for use in computing normal confidence limits)

n.h - Degrees of freedom for log.theta (for computing confidence limits)

orss - Odds ratio skill score, aka Yules's Q

ORSS.se - Standard Error for Odds ratio skill score (CI's)

eds - Extreme Dependency Score

esd.se - Standard Error for EDS (CI's)

seds - Symmetric Extreme Dependency Score

seds.se - Standard Error for Symmetric Extreme Dependency Score (CI's)

EDI - Extreme Dependency Index

EDI.se - Standard Error for EDI

SEDI - Symmetric EDI

SEDI.se - Standard Error for SEDI

Bias-Corrected Gilbert Skill Score (not in Matt's verification package)

Matt's R code for these and other stats are included below:

table.stats
function (obs, pred = NULL, silent = FALSE)
{
    if (is.null(pred) & length(obs) == 4) {
        if (!silent) {
            print(" Assume data entered as c(n11, n01, n10, n00) ObsForecast")
        }
        a <- obs[1]
        b <- obs[2]
        c <- obs[3]
        d <- obs[4]
        tab.out <- matrix(c(a, c, b, d), nrow = 2)
    }
    if (is.null(pred) & is.matrix(obs) & prod(dim(obs)) == 4) {
        if (!silent) {
            print(" Assume contingency table has observed values in columns, forecasts in rows")
        }
        obs <- as.numeric(obs)
        a <- obs[1]
        b <- obs[3]
        c <- obs[2]
        d <- obs[4]
        tab.out <- matrix(c(a, c, b, d), nrow = 2)
    }
    if (!is.null(pred) & !is.null(obs)) {
        tab.out <- table(as.numeric(obs), as.numeric(pred))
        a <- tryCatch(tab.out["1", "1"], error = function(e) 0)
        b <- tryCatch(tab.out["0", "1"], error = function(e) 0)
        c <- tryCatch(tab.out["1", "0"], error = function(e) 0)
        d <- tryCatch(tab.out["0", "0"], error = function(e) 0)
    }
    n <- a + b + c + d
    s <- (a + c)/n
    TS <- a/(a + b + c)
    POD <- H <- a/(a + c)
    F <- b/(b + d)
    TS.se <- sqrt((TS^2) ((1 - H)/a + b (1 - F)/((a + b +
        c)^2)))
    SH2 <- H (1 - H)/(a + c)
    SF2 <- F (1 - F)/(b + d)
    POD.se <- sqrt(SH2)
    F.se <- sqrt(SF2)
    M <- c/(a + c)
    FAR <- b/(a + b)
    FAR.se <- sqrt((FAR^4) ((1 - H)/a + (1 - F)/b) (a^2)/(b^2))
    HSS <- 2 (a d - b c)/(1 (a + c) (c + d) + 1 (a +
        b) (b + d))
    SHSS2 <- SF2 (HSS^2) (1/(H - F) + (1 - s) (1 - 2
        s))^2 + SH2 (HSS^2) (1/(H - F) - s (1 - 2 s))^2
    HSS.se = sqrt(SHSS2)
    PSS <- 1 - M - F
    PSS.se <- sqrt(SH2 + SF2)
    KSS <- (a d - b c)/((a + c) (b + d))
    PC <- (a + d)/(a + b + c + d)
    PC.se <- sqrt(s H (1 - H)/n + (1 - s) F (1 - F)/n)
    BIAS <- (a + b)/(a + c)
    OR <- a d/(b c)
    ORSS <- (a d - b c)/(a b + b c)
    HITSrandom <- 1 (a + c) (a + b)/(a + b + c + d)
    p <- (a + c)/n
    ETS <- (a - HITSrandom)/(a + b + c - HITSrandom)
    ETS.se <- sqrt(4 SHSS2/((2 - HSS)^4))
    theta <- (a d)/(b c)
    log.theta <- log(a) + log(d) - log(b) - log(c)
    n.h <- 1/(1/a + 1/b + 1/c + 1/d)
    yules.q <- (theta - 1)/(theta + 1)
    SLOR2 <- 1/n.h
    LOR.se <- sqrt(SLOR2)
    ORSS.se <- sqrt(SLOR2 4 OR^2/((OR + 1)^4))
    eds <- 2 log((a + c)/n)/log(a/n) - 1
    eds.se <- 2 abs(log(p))/(H (log(p) + log(H))^2) sqrt(H
        (1 - H)/(p n))
    seds <- (log((a + b)/n) + log((a + c)/n))/log(a/n) - 1
    seds.se <- sqrt(H (1 - H)/(n p)) (-log(BIAS p^2)/(H
        log(H p)^2))
    EDI <- (log(F) - log(H))/(log(F) + log(H))
    EDI.se <- 2 abs(log(F) + H/(1 - H) log(H))/(H (log(F) +
        log(H))^2) sqrt(H (1 - H)/(p n))
    SEDI <- (log(F) - log(H) - log(1 - F) + log(1 - H))/(log(F) +
        log(H) + log(1 - F) + log(1 - H))
    SEDI.se <- 2 abs(((1 - H) (1 - F) + H F)/((1 - H)
        1 - F) log(F (1 - H)) + 2 H/(1 - H) log(H (1 -
        F)))/(H (log(F (1 - H)) + log(H (1 - F)))^2)
        sqrt(H (1 - H)/(p * n))

Also add in the bias-corrected ETS (BCETS) with bootstrap CI's. A paper describing the method is attached. Perhaps call it BCGSS since ETS is called GSS in the existing MET output. [MET-337] created by tressa

dwfncar commented 10 years ago

The following R code can be used to compute BCGSS:

BCGSS Reference:

Bias Adjusted Precipitation Threat Scores

F. Mesinger, Adv. Geosci., 16, 137-142, 2008

calcBCGSS = function(d){
if( 0 == d$total ){ return (NA); }
dblF = d$fy_oy + d$fy_on;
dblO = d$fy_oy + d$fn_oy;
dblLf = log(dblO / d$fn_oy);
dblHa = dblO - (d$fy_on / dblLf) lambert_W0(dblO / d$fy_on dblLf);
return( (dblHa - (dblO^2 / d$total)) / (2*dblO - dblHa - (dblO^2 / d$total)) );
} by johnhg

dwfncar commented 10 years ago

Committed a whole bunch of changes to the repository to add these statistics. However, I'm keeping the ticket open for now, because I want to check the computed values against what R produces one more time before resolving. by johnhg

dwfncar commented 10 years ago

Added several output columns to the end of the CTS line type: LODDS, ORSS, EDS, SEDS, EDI, SEDI, and BCGSS plus confidence intervals. Tested by comparing to the output of the R verification package. by johnhg

dtcenter / MET

Enhance contingency table statistics to include additional statistics #337

BCGSS Reference:

Bias Adjusted Precipitation Threat Scores

F. Mesinger, Adv. Geosci., 16, 137-142, 2008