bambinos / bambi

BAyesian Model-Building Interface (Bambi) in Python.
https://bambinos.github.io/bambi/
MIT License
1.09k stars 126 forks source link

Multidimensional Spline Implementation #806

Open tjburch opened 7 months ago

tjburch commented 7 months ago

This might be opening a can of worms, but the 1D bspline implementation has been really nice to use - It would be really nice to have multidimensional splines, like tensor interaction (ti) or tensor smooth (te) splines seen in mgcv. Very often I'm looking for non-linearities within multiple dimensions and use those, a similar bambi implementation would be nice.

tomicapretto commented 7 months ago

I could google it to find an implementation. But are you aware of numpy implementations of those, or something in R? (idk if mgcv writes them in R or in a lower-level language). If that is the case, we can easily add it to Bambi.

tjburch commented 7 months ago

Yeah, I did some searching last night hoping I could start it myself, but wound up punting after seeing the code.

mgcv appears to have te and ti in R directly. The code is pretty dense and I haven't spent sufficient time digesting it to know if there's a call to an underlying lower-level language but on a quick read through it seems like not.

As far as python, I don't know of any numpy implementation, and statsmodels current GAM implementation only supports 1D bsplines.

PyGAM has a class called TensorTerm that's called directly when you use te.

tomicapretto commented 7 months ago

Thanks @tjburch! I'll have a look at those

ptonner commented 6 months ago

Just wanted to chime in that I've looked into the mgcv source in the past and also found it fairly dense. But after reviewing the docs for te and reviewing some of the details on smooths in this book, I think an initial implementation for multidimensional splines (at least the bases), shouldn't be too difficult.

Basically you build individual splines for each dimension, then take the tensor (i.e. kronecker) product to generate your full design matrix. A quick look through the pyGAM code suggests this is the case (e.g. iteratively adding tensor products that appear to be essentially kroneck products)

Getting ti functionality to isolate terms of a specific order (like in functional ANOVA) may be a bit more challenging. But it also may be clear once the full product approach is done.

tomicapretto commented 6 months ago

Thanks for the insights @ptonner!

ptonner commented 6 months ago

A couple more thoughts here:

I started looking into how you might implement this in bamb/formulae, but I'm not sure how you would handle a formula term that has multiple column names. E.g. is it possible to support something like bmb.Model("y ~ te(x, z)", data) for columns x and z in data? It seems like all the terms/transforms currently only operate on one dimensional data from a single column. Ideally, we could also do more than just two columns as well.

Second, it looks like brms is using t2 instead of te for it's product smooths (doc). Looking at the mgcv reference, I'm not sure what is actually relevant in the difference, but may be relevant

tomicapretto commented 6 months ago

@ptonner thanks for the thoughts (and sorry for the late response). Yes, Bambi supports functions with more than a single variable. We already have c(y1, y2, y3) ~ 1 + x1 + x2 for when we do multinomial models or hsgp(x1, x2) for Hilbert's Space Gaussian Processes (https://bambinos.github.io/bambi/notebooks/hsgp_2d.html#a-single-group).

We can basically create an arbitrary function.