dswah / pyGAM

[HELP REQUESTED] Generalized Additive Models in Python
https://pygam.readthedocs.io
Apache License 2.0
871 stars 159 forks source link

How to interpret the `coef_` attribute from a fitted model? #208

Closed jolespin closed 5 years ago

jolespin commented 6 years ago

I am trying to understand the tutorial: https://codeburst.io/pygam-getting-started-with-generalized-additive-models-in-python-457df5b4705f

How can I interpret the coef_? Why are there 121 coefficients with 6 attributes?

import pandas as pd        
from pygam import LogisticGAM
from sklearn.datasets import load_breast_cancer

#load the breast cancer data set
data = load_breast_cancer()
#keep first 6 features only
df = pd.DataFrame(data.data, columns=data.feature_names)[['mean radius', 'mean texture', 'mean perimeter', 'mean area','mean smoothness', 'mean compactness']]
y = pd.Series(data.target)
X = df[['mean radius', 'mean texture', 'mean perimeter', 'mean area','mean smoothness', 'mean compactness']]

#Fit a model with the default parameters
gam = LogisticGAM().fit(X, y)
X.shape, gam.coef_.shape
# ((569, 6), (121,))
daventero commented 6 years ago

By default pygam creates 20 splines per feature, that is 120 coefficients...one per each spline in each feature, plus the intercept....121.

You can check the source code here https://github.com/dswah/pyGAM/blob/master/pygam/terms.py#L1639

Does that help you?

daventero commented 6 years ago

Actually you could change the number of splines in each feature via the n_splines argument.

from pygam import LinearGAM, s, f
from pygam.datasets import wage

X, y = wage()
gam = LinearGAM(s(0, n_splines=30) + s(1, n_splines=30) + f(2)).fit(X, y)

print(X.shape, gam.coef_.shape)
# ((3000, 3), (66,))

As you can see there are , 30 coefficients, per each 'splined' feature plus, 1 for the intercept, plus 5 coefficients for the categorical (or factor) variable, one for each level, as f() function performs one-hot-encoding.

jolespin commented 6 years ago

I will get back to you on this after I read this. I have never heard about GAMs until I watched the PyData talk and I am extremely interested. I need to figure out how to interpret all of the spline values for an independent variable (covariate?) / attribute. Thanks for your response. I look forward in diving deep into these methods.

dswah commented 5 years ago

@jolespin i will close this issue. please re-open it as you see fit.