dswah / pyGAM

[HELP REQUESTED] Generalized Additive Models in Python
https://pygam.readthedocs.io
Apache License 2.0
862 stars 159 forks source link

Expanded form of the fitted model? #235

Open ghost opened 5 years ago

ghost commented 5 years ago

I am using pygam.gam( ) that is fitting all the features with splines (without my imposing any specific term to any feature), so is it fair to say that GAM is equivalent to multivariate adaptive regression splines (MARS) in this case? If that's true, then I am looking for the expanded form of the model equation that GAM fits on the data, which would be more than the coefficients of the splines (gam.coef_). Specifically, I would like to know

  1. the hinge functions of all splines, and
  2. if there is a mechanism within pygam that ensures there is no overfit of the model since I don't see the splines pruned from their default number (20 for each feature).

I could not find anything on these two things either in the doc or the issues.

ghost commented 5 years ago

Hello @dswah : I came across your answer to another post that closely relates to my question, but on trying those suggested commands it seemed that some of those 'inner' functions (e.g. * _edge_knots, * _n_splines, etc.) did not work for me in the 0.8.0 version of pygam as I get AttributeError for most of them. I think if those functions work along with pygam.utils.b_spline_basis then that would be sufficient.

On a related note, I have another doubt: what does the numerical value pdep that is the output of partial_dependence indicate, which contains both negative and positive numerical values? It does not seem to be in the range that is obtained as an output from the fitted splines (gam.predict).

GithubUsr140906 commented 5 years ago

I have the same issue with the 'inner' functions not working. I performed a grid search, and would like to know the order of the spline fit on each feature function. I cannot seem to be able to retrieve it anywhere.

Cheers

shyamcody commented 4 years ago

Hi @L4student, So if you read the attached documents referred to as reading material, you will understand that

  1. like hinge functions, in this case, there are a few families of spline functions which make the basis for the spline functions. These are B-spline functions and penalized B-splines or p-splines; and different varieties of these families like cyclic spline, thin-plated splines, and others. Read about spline-basis function families to know more details.
  2. To reduce overfitting, spline functions use different types of penalties to reduce "wiggliness" of the fitted functions. For understanding more, see this pdf.
  3. pdep is supposed to provide the linear contribution each term make for the final prediction. Obviously if you add link functions on top of it (i.e. you don't use identity link function) then the scale may not match.
  4. Please create appropriate issues with more example code where you were not able to create the inner functions, or cite other issues so that we could work on resolving them.