dswah / pyGAM

[HELP REQUESTED] Generalized Additive Models in Python
https://pygam.readthedocs.io
Apache License 2.0
857 stars 157 forks source link

Estimated function is not consistent with data #297

Open robsonucl opened 3 years ago

robsonucl commented 3 years ago

Hi, I found this weird bug when plotting the estimated curve against the observed data. Any idea why this is happening? I only adapted your code to include my data (matrix X). Thank you!

`for i, term in enumerate(gam.terms): if term.isintercept: continue

XX = gam.generate_X_grid(term=i)
pdep, confi = gam.partial_dependence(term=i, X=XX, width=0.95)

plt.figure()
plt.plot(XX[:, term.feature], pdep)
plt.plot(XX[:, term.feature], confi, c='r', ls='--')
plt.scatter(X.iloc[:,i], y, facecolor='gray', edgecolors='none')

plt.title(repr(term))
plt.show()`

image

dswah commented 3 years ago

Ah wow this is a weird case indeed! But at first glance it seems to me that the model has produced a good representation of the data. [1]

Can you tell me what you were expecting?

Also, could you share some details about the model setup?

And then share a histogram of your Y-data / dependent variable?

[1] my interpretation:

I see some undesirable qualities of this model:

  1. which is the "ringing" towards the right side (in order to assign a high response to the single point at X2=45, the model needs to correct and add an overly negative response to the neighboring splines
  2. the obviously different model response between X2 < 40 (mostly 0) and X2 > 40 (large and positive).

Let me know what you think!