Incorporate uncertainty in PSM and Z thresholds

I've worked out a way to approach the "PSM threshold - z_crit analysis" (which desperately needs a snappier name) as a proper Bayesian decision analysis.

Briefly, there are three sources of uncertainty or variability we need to consider:

Heterogeneity among sites (subbasins) in the relationship between Z and PSM
Posterior parameter uncertainty around that site-specific relationship
Uncertainty in the current state, i.e. Z and PSM (which is closely related to posterior parameter uncertainty)

The first of these, heterogeneity among sites, is represented by the hierarchical site-specific intercepts of the logistic regression (b0). This implies that each site will have its own z_crit, the value of Z below which we are "confident" (as defined by some probability alpha) that PSM < PSM_crit.

Posterior uncertainty in the regression relationship and current PSM depends on what hierarchical level we are interested in predicting: an arbitrary year at a given site (so the prediction includes the observation-level residual variation) or the average year at a given site (so the obs-level residual is excluded).

The figure below shows predictions at the site level (for an average year). Predictions are conditioned on sample-average precipitation (i.e., the precip effects are set to zero). The light gray curves are the posterior median site-specific regression relationships (uncertainty is suppressed for clarity), and the points are estimated current conditions (again, posterior medians of Z and PSM). One site (Big Scandia Creek) is highlighted for illustration, and the 90% credible interval around its regression curve is shown.

The decision analysis proceeds as follows:

Choose the confidence level (here alpha = 0.9).
Choose the PSM threshold PSM_crit (here 0.3; horizontal red line).
Find z_crit such that P(PSM < PSM_crit | Z = z_crit, data) = alpha. That is, z_crit is the value of Z such that there is a posterior probability alpha that PSM is below the threshold. This is shown by the vertical violin plot representing the posterior predictive distribution of PSM risk at z_crit (the vertical red line).
Repeat step 3 for all sites. The rug shows all site-specific z_crit values.
Find deltaz, where P(Z + deltaz < zcrit | data) = alpha. That is, deltaz is the change (shown by the arrow) in the current estimate of Z such that there is a posterior probability alpha that the new Z is below z_crit, accounting for uncertainty in the current Z (shown by the horizontal violin plot).
Repeat step 5 for all sites. The shading shows all site-specific delta_z values.

The next step is to plot delta_z against various "benefits" and score sites by one or more weighting functions. The Bayesian decision-theoretic approach outlined above has the pleasant side effect that each site is either (1) above its z_crit and above PSMcrit (deltaz < 0, restoration sites), or (2) below its z_crit and below PSMcrit (deltaz < 0, conservation sites). In addition, one "benefit" could be the change in PSM risk at z_crit relative to current conditions, accounting for uncertainty in the latter. That would lead naturally to |deltaPSM|/|deltaz| as a score, interpreted as sensitivity of PSM to a change in urbanization -- either bang for buck in the restoration case, or resilience to increasing urbanization, in the conservation case.

I'll get this plot mocked up and post it here, instead of spending another thousand words describing it.

ebuhle / cohoPSM

Incorporate uncertainty in PSM and Z thresholds #6