dvbuntu / barmpy

Python module for Bayesian Additive Regression Models
https://dvbuntu.github.io/barmpy
MIT License
4 stars 0 forks source link

Implement Posterior Mean Model #1

Open dvbuntu opened 2 months ago

dvbuntu commented 2 months ago

BARN models currently only return a single ensemble from the posterior distribution (i.e. a single MCMC replicate). BART, however, allows returning an average over multiple MCMC iterations. Doing such averaging means the final model approximates the expected value of the posterior distribution, not just a single sample from it. This may improve modeling results in some contexts, especially if the variance in the posterior is relatively large (measured by the model sigma estimate).

Practically, there are a few considerations. First, because successive MCMC iterations are correlated, we only want to sample every so many steps (anecdotally, the integrated autocorrelation time is about 7 steps, but that depends on the problem, ensemble size, and other parameters). From a computational perspective, we can save some effort if the same model within the ensemble stays the same (i.e. declines to transition) between two samples in the average. In that case, we can just double weight that model. This requires some additional bookkeeping over just saving every Kth ensemble separately.

The actual output should probably be saved as a new ensemble model (even a barmpy.barn.BARN object itself), just with num_nets*M total networks, where M is the number of samples from the posterior to average over. The final output should also divide by M to ensure it's an average, or we can adjust the weights of the final NN layer to scale similarly (i.e. divide those weights by M instead and sum over the various ensembles).

dvbuntu commented 2 weeks ago

Prototype implemented saving M different sets of neural nets (i.e. self.cyberspace objects). This also includes an uncertainty estimate by using the mean and sigma values to create a 95% z-score interval. Still in testing.