equinor / graphite-maps

Graph informed triangular ensemble-to-posterior maps
GNU General Public License v3.0
1 stars 0 forks source link

Add convenience function for calculating loss (log-likelihood) over data, also information criterion adjusted #7

Open Blunde1 opened 9 months ago

Blunde1 commented 9 months ago

The loss should be using the triangular structure, and thus additive nature of the log-likelihood. For each sub log-likelihood, we may add the information criterion component. I.e., we seek to evaluate $$l(u;\hat{\Lambda})=\sum_j l(u_j;\hat{C}_j)$$

and to evaluate $$E[l(u{test};\hat{\Lambda})]$$ as $$E[l(u{test};\hat{\Lambda})]\approx \sumj l(u{j,~ train};\hat{C}_j) + IC(\hat{C}_j)$$ where we may try for $IC(\hat{C}_j)$ the

All of the above employs asymptotic results. Is it possible to use e.g. the bootstrap (or the bootstrap in the frequentist domain) to alleviate these assumptions for when $n$ is small?

Blunde1 commented 9 months ago

To second order, quite generally we have $$IC(\theta) = tr\left(E\left[\nabla_\theta^2 l(u;\hat{\theta})\right] Cov(\hat{\theta})\right)$$ The sample average to replace $Cov(\hat{\theta})$ is not the best estimator. In fact the "trace inner product" induces the Frobenius norm as a measure, and there exists results on adaptive inflation to improve the estimator under this norm.

Blunde1 commented 9 months ago

It is not exactly the sample estimator that is employed for $Cov(\hat{\theta})$ but rather the Delta method using the sample covariance for $Cov(\nabla_\theta l(u;\hat{\theta}))$. The argument on "best estimator" above still applies. This is particularly relevant for $p>>n$ but a global maximum exists (i.e. under L-2 regularization of the objective).

Note, this (implicitly) answers the question in https://www.tandfonline.com/doi/abs/10.1198/000313006X152207 on why practitioners should care about parameter variance while ignoring bias.