Choose threshold for predictions

geneorama commented 7 years ago

Balance the false negatives / false positives

With the most recent model we have very good class separation, but we still need to find an appropriate threshold that will minimize overspraying. The current model is using glmer a generalized linear mixed-effects model (GLMM) from the lme4 package. For reference the current model version is

m3 <- glmer(wnvwnv ~ wnvw1 + wnvw2 + wnv_ytd + 
                awnd + tmax + prcp + (1 | id) + (1 | week) , 
            family = binomial, data = xmat[grp!="year2016" & grp != "year2009"])

Class separation for all observations in test dataset:

The zeros on the left generally have a very low score (close to zero, which is good) but the few that do have high scores are fairly unpredictably high. The goal is to find a cutoff that doesn't eliminate too many positives (on the right) but cut out some of the high scoring negatives (on the left).

Currently a lot of the false positives are occurring late in the season, and each year has a different average score. Looking at the cutoffs year by year gives a better idea of how a new year would play out.

geneorama commented 7 years ago

Below is a summary of performance with three thresholds chosen.

The calculations appear in R/33b_multilevel_metrics_busrule.R.

For deployment, we're going to put the raw score into WindyGrid and embed the threshold into the filter. This doesn't solve the problem of picking a threshold, but it does eliminate the need to pick a single inflexible threshold.

For now the medium threshold (.25) will probably be used in the map.

levyj commented 7 years ago

Since I do not have the full background on this in a number of ways, this may prove to be an unhelpful comment (in which case, feel free to ignore it) but might the epidemiology metrics of Positive Predictive Value and Negative Predictive Value be applicable in some way? Basically, they combine sensitivity/specificity (properties of the test) with prevalence of the condition (a property of the population being tested). It is why routine screening for some conditions can be a good idea in some high-prevalence populations and a terrible idea in some low-prevalence populations.

By the way, one approach that sometimes is taken in routine disease screening, although I do not know if it would apply to WNV, is to screen in two phases. Test a lot of people with an inexpensive (in many ways) test with very good sensitivity, even if bad specificity. Anyone who comes back negative goes on their way. Anyone who comes back positive is run through a second, often more expensive test with better specificity and still good sensitivity. Ideally, you end up with good Negative Predictive Value (negative on Test 1 OR Test 2) and Positive Predictive Value (positive on Test 1 AND Test 2) without undue spending, physical discomfort/risk from the tests themselves, or panic.

PriyaDoIT commented 7 years ago

We will create a filter on WindyGrid that establishes "elevated risks".

Chicago / west-nile-virus-predictions

Choose threshold for predictions #24