I created a model that had categorical features above the value of 0 (range of n to m, where n>0 and m>0). I wanted to plot the partial dependence for my model, but ran into a ValueError (error recreated below). The problem is that generate_X_grid creates a matrix that looks like this:
[[0,0,0, ..., 0, i, 0, ..., 0,0,0],
[0,0,0, ..., 0, i, 0, ..., 0,0,0],
...,
[0,0,0, ..., 0, i, 0, ..., 0,0,0]]
And for models that have been trained with categorical features that do not have '0' as a category, this will raise an error when calling the partial dependence function.
Here is a recreation of the error using the Quick start example code:
Input:
from pygam.datasets import wage
X, y = wage()
from pygam import LinearGAM, s, f
gam = LinearGAM(f(0) + s(1) + f(2)).fit(X, y) ##Use f(0) to make the 0th term categorical. The 0th term contains no value equal to 0
import matplotlib.pyplot as plt
for i, term in enumerate(gam.terms):
if term.isintercept:
continue
XX = gam.generate_X_grid(term=i)
pdep, confi = gam.partial_dependence(term=i, X=XX, width=0.95)
#plt.figure()
plt.plot(XX[:, term.feature], pdep)
plt.plot(XX[:, term.feature], confi, c='r', ls='--')
plt.title(repr(term))
plt.show()
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-10-0e5df89ff530> in <module>()
7 XX = gam.generate_X_grid(term=i)
8 print(XX)
----> 9 pdep, confi = gam.partial_dependence(term=i, X=XX, width=0.95)
10
11 #plt.figure()
/Users/tatekeller/opt/anaconda3/envs/pbh/lib/python3.6/site-packages/pygam/pygam.py in partial_dependence(self, term, X, width, quantiles, meshgrid)
1542 features=self.feature, verbose=self.verbose)
1543
-> 1544 modelmat = self._modelmat(X, term=term)
1545 pdep = self._linear_predictor(modelmat=modelmat, term=term)
1546 out = [pdep]
/Users/tatekeller/opt/anaconda3/envs/pbh/lib/python3.6/site-packages/pygam/pygam.py in _modelmat(self, X, term)
455 X = check_X(X, n_feats=self.statistics_['m_features'],
456 edge_knots=self.edge_knots_, dtypes=self.dtype,
--> 457 features=self.feature, verbose=self.verbose)
458
459 return self.terms.build_columns(X, term=term)
/Users/tatekeller/opt/anaconda3/envs/pbh/lib/python3.6/site-packages/pygam/utils.py in check_X(X, n_feats, min_samples, edge_knots, dtypes, features, verbose)
301 'feature {}. Expected data on [{}, {}], '\
302 'but found data on [{}, {}]'\
--> 303 .format(i, min_, max_, x.min(), x.max()))
304
305 return X
ValueError: X data is out of domain for categorical feature 0. Expected data on [2003.0, 2009.0], but found data on [0.0, 0.0]
The versions that I used are:
pyGAM=0.8.0
Python=3.6.12
For now I will work around this by subtracting the respective minimum value from each categorical value changing the category range values from (n,m) to (n-n, m-n)==(0,m-n).
Hi there,
I created a model that had categorical features above the value of 0 (range of n to m, where n>0 and m>0). I wanted to plot the partial dependence for my model, but ran into a ValueError (error recreated below). The problem is that generate_X_grid creates a matrix that looks like this:
And for models that have been trained with categorical features that do not have '0' as a category, this will raise an error when calling the partial dependence function.
Here is a recreation of the error using the Quick start example code:
Input:
Output:
The versions that I used are: pyGAM=0.8.0 Python=3.6.12
For now I will work around this by subtracting the respective minimum value from each categorical value changing the category range values from (n,m) to (n-n, m-n)==(0,m-n).
Thanks in advance