Evaluate novel model full risk distributions

finncatling commented 3 years ago

...In a separate analysis than the comparison with the current model. Use a metric that works on risk distributions rather than point estimates.

finncatling commented 3 years ago

I think log pointwise predictive density is what we want here. Given that we are evaluating test fold predictions, we don't need to consider metrics (e.g. WAIC) which compensate for overfitting in in-sample predictions.

It might be interesting to summarise the dispersion of predicted distributions where no imputation / different types of imputation occur

JMathiszig-Lee commented 3 years ago

do you have some references? i'm finding it difficult to find anything that describes what 'good' looks like

finncatling commented 3 years ago

http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf Section 2 (particularly 2.4) describes LPPD which scores predictive accuracy using the entire posterior. Section 3 goes on to describe methods for penalising LPPD scores obtained on training data, but I don't think we need to penalise as all our scores are obtained on test data.

LPPD is a measure of predictive accuracy that will reward (other things being equal) less dispersed predicted mortality risk distributions, so we can't use it as a measure of whether our uncertainty quantification (e.g. the extent to which our predicted risk distributions are dispersed in patients with missing lactate and albumin) is calibrated. That's why I was suggesting reporting average dispersion (e.g. average IQR) for our predicted distributions stratified by no imputation / different types of imputation.

Obviously our methods aren't fully Bayesian and our GAMs really obtain an approximation of the 'Bayesian posterior' you would conventionally use to calculate an LPPD score - I will dig out some references from Simon Wood which quantify how good an approximation this is likely to be

finncatling commented 3 years ago

Foundational reference for GAMs is:

Hastie T, Tibshirani R. Generalized Additive Models. Stat Sci 1986;1:297–310.

A more applied reference to justify our regularisation approach (select a spline basis of slightly excessive 'potential wiggliness' and penalise its second derivative) is Chapter 3 of:

Wood SN. Generalized additive models : an introduction with R. Boca Raton, Florida : Chapman & Hall/CRC 2006.

Wood 2006 p201, Figure 4.17 (subplot 3) shows simulation results suggesting that the component-wise confidence intervals that we show in our PD plots may not be very accurate, but that the model-wide confidence intervals 'summing' these are reasonably accurate. Note that, unlike us, these simulations use a fitted lam value rather than a prespecified one

finncatling commented 3 years ago

I've added the above references to the manuscript

finncatling commented 3 years ago

Switched to evaluation just on NELA-calculator-complete cases, using per-sample versions of existing metrics

finncatling / lap-risk

Evaluate novel model full risk distributions #105