NathanielF / NathanielF.github.io

0 stars 0 forks source link

Feedback on GAMs and GPs post #1

Open tomicapretto opened 5 months ago

tomicapretto commented 5 months ago

Hi Nathaniel! Following our discussion on https://github.com/bambinos/bambi/issues/796, here I'm adding some notes I wrote down while reading your post. You're doing a massive contribution, thanks!


Under the hood Bambi makes use of the patsy package formula syntax to specify spline basis terms.

Bambi uses its own implementation called 'formulae' (github.com/bambinos/formulae). I created it to support mixed effects model formulas (i.e. the (1 | group) syntax).

bs(X, degree=1, knots=knots_6, intercept=True)

I'm not sure why you want to use intercept=True. This means the vector basis spanned by the columns of spline include the intercept. Thus, the underlying design matrix is not of full rank and there is a redundant parameter. You could drop the intercept by adding a "0" to the RHS of the formula, or just not pass intercept=True. This can help https://bambinos.github.io/bambi/notebooks/splines_cherry_blossoms.html#advanced-watch-out-the-underlying-design-matrix (i just detected a typo in the first paragraph, beta is of length "p", not "n").

I see later, when you use bs_patsy, you do "bs_patsy(dev_periods, knots=knots, degree=3, include_intercept=True) - 1" which is not introducing redundancies beacuse the intercept is spanned once.

Here we see that the extra complexity of using 15 splines leads to slightly worse performance measures than the less complex but seemingly adequate 10 splines.

A small detail, but in the plot above that text you have legends appearing twice.

Next we plot the posterior predictive distribution of our observed variable and compare against the observed data. Additionally we plot the 89th and 50% HDI.

You could also create that plot with the interpret submodule. See when we use pps=True here. You could call it twice and then you would get the two bands (once with pps=True and other without it).

def make_model(loss_df, num_knots=3, max_dev=7, model_type='mixed')

Plot the Hierarchical Components

In a first pass, it's not clear to me what's the relationship between all the curves in there. For example, is any of the curves the sum or difference between another curves?

We define two basic models for contrasting. Note here how we have to define a seperate spline basis for each of the covariates.

I think the readibility of the cell will improve if you create a function to create the knots for you so you don't repeat it 6 times.

formula_spline1

Just wanted to say this model is hierarchical but it's only partially pooling the intercepts. I think it's possible to have hierarchical splines in Bambi using a custom prior. I could try to put together an example if you're interested.


Other suggestions:

Typos:

NathanielF commented 5 months ago

Thanks so much for the feedback @tomicapretto , I'll work through the points this coming week.

Just wanted to say this model is hierarchical but it's only partially pooling the intercepts. I think it's possible to have hierarchical splines in Bambi using a custom prior. I could try to put together an example if you're interested.

Very interested! I would be great to see how to do hierarchical splines in Bambi!

NathanielF commented 4 months ago

Just a note to say I think I've addressed your comments above @tomicapretto !

Did you, by any chance, try (or manage) to do hierarchical splines with Bambi?

tomicapretto commented 4 months ago

Hi @NathanielF I was not able to come back to that but I'll try to get an example as soon as possible :D

NathanielF commented 4 months ago

No worries! Thanks @tomicapretto