Open tjburch opened 7 months ago
I could google it to find an implementation. But are you aware of numpy implementations of those, or something in R? (idk if mgcv writes them in R or in a lower-level language). If that is the case, we can easily add it to Bambi.
Yeah, I did some searching last night hoping I could start it myself, but wound up punting after seeing the code.
mgcv
appears to have te
and ti
in R directly. The code is pretty dense and I haven't spent sufficient time digesting it to know if there's a call to an underlying lower-level language but on a quick read through it seems like not.
As far as python, I don't know of any numpy implementation, and statsmodels current GAM implementation only supports 1D bsplines.
PyGAM has a class called TensorTerm that's called directly when you use te.
Thanks @tjburch! I'll have a look at those
Just wanted to chime in that I've looked into the mgcv
source in the past and also found it fairly dense. But after reviewing the docs for te
and reviewing some of the details on smooths in this book, I think an initial implementation for multidimensional splines (at least the bases), shouldn't be too difficult.
Basically you build individual splines for each dimension, then take the tensor (i.e. kronecker) product to generate your full design matrix. A quick look through the pyGAM
code suggests this is the case (e.g. iteratively adding tensor products that appear to be essentially kroneck products)
Getting ti
functionality to isolate terms of a specific order (like in functional ANOVA) may be a bit more challenging. But it also may be clear once the full product approach is done.
Thanks for the insights @ptonner!
A couple more thoughts here:
I started looking into how you might implement this in bamb/formulae, but I'm not sure how you would handle a formula term that has multiple column names. E.g. is it possible to support something like bmb.Model("y ~ te(x, z)", data)
for columns x
and z
in data
? It seems like all the terms/transforms currently only operate on one dimensional data from a single column. Ideally, we could also do more than just two columns as well.
Second, it looks like brms
is using t2
instead of te
for it's product smooths (doc). Looking at the mgcv reference, I'm not sure what is actually relevant in the difference, but may be relevant
@ptonner thanks for the thoughts (and sorry for the late response). Yes, Bambi supports functions with more than a single variable. We already have c(y1, y2, y3) ~ 1 + x1 + x2
for when we do multinomial models or hsgp(x1, x2)
for Hilbert's Space Gaussian Processes (https://bambinos.github.io/bambi/notebooks/hsgp_2d.html#a-single-group).
We can basically create an arbitrary function.
This might be opening a can of worms, but the 1D bspline implementation has been really nice to use - It would be really nice to have multidimensional splines, like tensor interaction (
ti
) or tensor smooth (te
) splines seen inmgcv
. Very often I'm looking for non-linearities within multiple dimensions and use those, a similar bambi implementation would be nice.