Open ilkot opened 3 years ago
Hi @ilkot Indeed, i also suspect that differences in the model specification could be the main cause of the discrepancies.
I am not very familiar with the MGCV syntax but i will try to answer.
I think the differences could come from:
x1+ s(x2,x3,x4,x5,x6)
appears to have a linear term added to a nonlinear function (interaction?) with several termss(x2, x3, ...)
is doing?from pygam import l, te
LinearGAM(terms=l(0) + te(1, 2, 3, 4, 5))
import numpy as np
# set up a search-space
lam = np.logspace(-3, 5, 50)
lams = [lam] * 2 # here you are specifyng a l2 reg. for your linear term, and a shared smoothing for all dims. of your tensor term
gam.gridsearch(X, y, lam=lams)
gam.summary()
Let me know if this helps!
Thanks for the detailed answer, appreciate it!
s(x2,x3,..) is the geodimensional s function as shown below for 2 variables and yes it is an interaction term
I tried to fit as you suggest but it throws memory error which is quite interesting because dataset only contains only 500 rows and 7 columns in total
pygam\terms.py", line 1318, in build_penalties
P = sp.sparse.csc_matrix(np.zeros((self.n_coefs, self.n_coefs)))
MemoryError: Unable to allocate 74.5 GiB for an array with shape (100000, 100000) and data type float64
I restart the kernel several times but it didn't change.
pygam: 0.8.0 python. 3.7.6
@dswah do you know how can I avoid this error?
Hi @ilkot ,
Were you able to resolve the issue of pygam's gam vs mgcv's gam?
@MRanka29 nope unfortunately.
It is more than a question than an issue. I'm trying to get same results with mgcv's and pygam's gam models but results are quite different.
here is the data with 500 observations https://file.io/0gSYwb0hinBI
R:
Output: -0.005903112
Python:
Output: 0.08549611
I'm suspicious about the formula in R because it is written as x1 + s(x2,...) but in pygam do we have a chance to write like that?
Any comment would be helpful! Thanks in advance!