beacon-biosignals / Effects.jl

Effects Prediction for Regression Models
MIT License
19 stars 4 forks source link

API for reference grid/design #5

Closed palday closed 1 year ago

palday commented 3 years ago

what's the reason for having both the formula and the design? Does everything listed in the design have to have a corresponding entry in the formula RHS?

It seems like you might have two different methods: one for a design dictionary, and one with a formula (where the design stuff is calculted automagically from the original data. I think it's a safe assumption that a formula is available from the original model, so you can get things like levels for categorical variables from those terms.

If we wanted to get REALLY fancy, we could have a special term type/syntax for this where in the formula you could specify an expression that would allow you to manually control the design for some terms and let the defaults happen for the others...but that's a low priority I think.

_Originally posted by @kleinschmidt in https://github.com/beacon-biosignals/Effects.jl/pull/1#discussion_r573833127_

palday commented 3 years ago

I think there are two types of effects that people might be interested in:

  1. excluded terms are computed at the level of their typical values, in which case the magic is essentially in creating a model matrix where the excluded terms are filled by typical values. This is like effects in the R world and should probably be the flagship effects() functionality.
  2. excluded terms have their coefficients zero'd out, and nothing special needs to be done to the model matrix. I don't know how well this would play with the error term we compute from vcov. This approach is sometimes called "partial effects" and is what the remef package in the R world does. If we omit the error term, then I would be tempted to call this partial_predict or something similar.

For (1), we need generate the model columns for higher-order terms from the typical values in the lowest-order terms and not using typical values of the higher-order terms in the original model matrix. Given this, @kleinschmidt and @ararslan have pointed out that it probably suffices to supply only the reference values for the effects and not a separate effects formula. Then we could have two methods, one taking a dictionary that is used to generate a fully crossed/balanced reference grid and one taking the reference grid directly for specifying particular combinations of the reference values. In both cases, the typical values and interaction columns would be added into the table before doing the effects computation.

palday commented 1 year ago

I think we've had a working version of (1) for a while now.

For MixedModels, (2) is available from MixedModelsExtras. I don't think we want to support that level of functionality here because I'm not sure it's trivial to do in a way that works across many different model types. Additionally, with the support for by-term typical functions (#48), it's possible for users to just pass a function that returns zero.