dswah / pyGAM

[HELP REQUESTED] Generalized Additive Models in Python
https://pygam.readthedocs.io
Apache License 2.0
852 stars 156 forks source link

Question on the weight parameter when fitting splines #306

Open ajiang1234 opened 2 years ago

ajiang1234 commented 2 years ago

Hello,

I'm using pyGAM to fit splines on data and have a question on the weight parameter that I encountered while fitting some of the splines. My understanding of the weight parameter initially was that giving a point a higher weight value would give it more weight when fitting the spline. On the other hand, when all points are given the same weight, regardless of what the weight amount was, I was thinking that all the points would be weighed the same in the fit. However, in a couple of the cases I ran, the splines resulted in different fits if all the points were given the same weight but the weight value was different in different runs. For example, here I specified a X and Y arrays that I was fitting:

x = [0,1,2,3,4,5,6,7,8,9]
y = [0.547526222,1.160272069,4.678268611,9.017480403,16.54352498,25.62930432,36.22266268,49.24193023,64.47694137,81.21989368]

I first fit this with a weight of 1 for all the points:

weight1 = [1,1,1,1,1,1,1,1,1,1]
example1_GAM = GAM(s(0, n_splines=4, spline_order=3,constraints='monotonic_inc'))
example1_fit_GAM = example1_GAM.fit(X=x,y=y, weights=weight1)
example1_smooth_GAM = example1_GAM.predict(x)
example1_smooth_GAM

The resulting fit I got was:

array([-11.36832015,  -2.51345325,   6.37435826,  15.29510792,
        24.24878925,  33.2353958 ,  42.25492108,  51.30735863,
        60.39270199,  69.51094468])

I tried the fit again by giving all the points weight of 10 but kept everything else the same. Since all points were given the same weight, even though the weight amount was different from the same time, my understanding was that the fit should be the same. However, this time I got a different result. This is the code I had used to fit X and Y with weight of 10:

example10_GAM = GAM(s(0, n_splines=4, spline_order=3,constraints='monotonic_inc'))
example10_fit_GAM = example10_GAM.fit(X=x,y=y, weights=weight10)
example10_smooth_GAM = example10_GAM.predict(x)
example10_smooth_GAM

The resulting fit I got this time was:

array([-9.84555549, -2.00602823,  6.12040023, 14.53366555, 23.23370338,
       32.22044941, 41.49383927, 51.05380865, 60.90029319, 71.03322856])

I did another try by giving all points weight of 1000 and also got a different result than the previous two cases:

array([-2.21402426,  1.15789171,  5.43611926, 10.86058428, 17.67121268,
       26.10793036, 36.41066323, 48.81933719, 63.57387813, 80.91421198])

Because this is different than how I was thinking weight would affect the points, I was hoping to get some insights on how the weight parameter in pyGAM works and why different values of weight would result in different fits when all the points were given the same weight. Any additional information or insights on this would be greatly appreciated.