eenticott / gamstackr

Tools for stacking probabilistic densities.
GNU General Public License v3.0
0 stars 0 forks source link

Develop object oriented framework to handle experts #5

Open mfasiolo opened 2 years ago

mfasiolo commented 2 years ago

At the moment the experts are fitted beforehand and then their predictive densities are placed in a matrix for stacking model fitting.

It would be useful to create a framework where each expert is an object with a minimal set of member functions, such as:

  1. expert$fit(data) that fits the expert model to some data
  2. expert$loglik(data) that evaluates the log-likelihood on data
  3. expert$simulate(data) that simulates responses given some covariate data

The purpose of setting this up is that then we could:

  1. Create user-friendly functions for fitting the experts and stacking model to data (using cross-validation, rolling windows etc)
  2. Simulate responses from the stacking mixture given some training or testing data. Such simulations can be used to a) obtain residuals and, with those, we could exploit the diagnostic plots provided by mgcViz, b) obtain point estimates from the mixture (e.g., conditional mean, quantiles etc) or full probabilistic predictions.
eenticott commented 2 years ago

I am thinking s4 classes will be more useful for this implementation, do you agree. If we initialise a class with a learning algorithm and formula, then we can pass it new data via fit and update, and the class can also store all the data it has seen. With s3 I don't think we can store the fitted model back inside the class?

mfasiolo commented 2 years ago

Mmmm not sure I used S4 extensively for the synlik package. It's not bad, but too inflexible for a small-scale project (we are not planning to interface with many other packages). Also, S3 and S4 don't interface well, and the whole mgcv ecosystem is S3. But reading here

https://adv-r.hadley.nz/oo-tradeoffs.html

I wonder whether we could consider R6, given that it meant to "R6 is a profoundly different OO system from S3 and S4 because it is built on encapsulated objects, rather than generic functions. Additionally, R6 objects have reference semantics, which means that they can be modified in place." which seems to be what you want right?

However, I think that the same could be achieved with S3, where an object is pretty much just a list and you can update any element of that list. Also, the simulated responses produced by the stacking models could be used by the diagnostic plots provided by mgcViz, which is fully S3.

Personally, I would start with S3 and consider something different only if we encounter a problem.