gavinsimpson / gratia

ggplot-based graphics and useful functions for GAMs fitted using the mgcv package
https://gavinsimpson.github.io/gratia/
Other
206 stars 28 forks source link

Documentation for `gratia::posterior_samples()` behaviour with offsets #230

Closed jonathonmellor closed 1 year ago

jonathonmellor commented 1 year ago

Hi Gavin,

Really appreciate the work in this package - hoping to move our code to using it as much as possible.

Would it be possible to document (or happy to raise a PR if it's explained here) how gratia::posterior_samples behaves with predictions from models using an offset?

I'm unclear on whether the offset is ignored, or whether the offset is used to convert back to, for example, counts, before sampling from the error distribution?

gavinsimpson commented 1 year ago

Thanks for the question @jonathonmellor. What happens will depend on how you include the offset term in the model. This is because, as mentioned in ?mgcv::gam, whether or not predict.gam() ignores the offset depends on whether you included it via the offset() function in the model formula or via the offset argument to gam() etc.

posterior_samples() works by first calling fitted_samples() to get posterior expected values and then the relevant random number generator to generate new response values for those posterior expectations. fitted_samples() relies on predict.gam() (or predict.bam(), etc) and hence it and predicted_samples() and posterior_samples() will all be sensitive to how the offset term was included in the model.

If you included the offset through the formula, predict.gam() will require the offset variable to be present in the data at which you are trying to predict, so if you pass something to data it must contain the variable for the offset.

Basically, posterior_samples() is just a series of wrappers around predict.gam() and co, so any behaviour those functions have is inherited by posterior_samples().