dswah / pyGAM

[HELP REQUESTED] Generalized Additive Models in Python
https://pygam.readthedocs.io
Apache License 2.0
862 stars 159 forks source link

Factor `by' variable example similar to mgcv? #238

Open jmuhlenkamp opened 5 years ago

jmuhlenkamp commented 5 years ago

Thanks for the great package! Quick question about functionality that exists in mgcv that I can't seem to replicate in pygam.

mgcv example code

This comes directly from ?mgcv::gam.models

library(mgcv)

## Factor `by' variable example (with a spurious covariate x0)
## simulate data...

dat <- gamSim(4)

## fit model...
b <- gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(b,pages=1)

The above mgcv code generates a spline fit for each level of fac.

Example in pygam

When I set up a quick toy example in pygam, it's not clear to me how I could replicate the spline by factor variable capability that exists in mgcv.

from pygam import LinearGAM, s, intercept
from pygam.datasets import toy_interaction
X, y = toy_interaction(return_X_y=True)

## Make the second column a factor
X[:25000,1] = 0
X[25000:,1] = 1

gam = LinearGAM(s(0, n_splines=4, by=1)).fit(X, y)
mm = gam.terms.build_columns(X)
print(mm.todense()[0,:])
# [[0. 0. 0. 0. 1.]]

I would like the by argument to be used as a factor similar to mgcv when an R factor column is used. In which case I would expect the model matrix above to have 9 columns instead of 5.

Is there a way to use a categorical by variable in pygam?

dswah commented 5 years ago

@jmuhlenkamp hmm i see the difference.

I think the way to do that in pyGAM would be to create a tensor that interacts the 4-D spline term on feature 0, with the binary factor term on feature 1:

from pygam import LinearGAM, s, f, te
from pygam.datasets import toy_interaction
X, y = toy_interaction(return_X_y=True)

## Make the second column a factor
X[:25000,1] = 0
X[25000:,1] = 1

gam = LinearGAM(te(s(0, n_splines=4), f(1))).fit(X, y)
mm = gam.terms.build_columns(X)

print(mm.shape])
# (50000, 9)

Does this seem valid to you? Dani

PS i'm sorry for the insane response time!

jmuhlenkamp commented 5 years ago

Dani,

Thanks for the thoughtful response. If/when I find some time, I'll give your response some thought and post my updated thoughts here.

Jason

shyamcody commented 4 years ago

hey guys @jmuhlenkamp, @dswah any development on this issue?

5ch0r5ch1 commented 8 months ago

https://github.com/dswah/pyGAM/pull/302 works for me