Open jonathan-taylor opened 2 years ago
An issue with this fix is that the standard error of the bars will depend on where we evaluate. Might be better to return \hat{\mu}(X_grid)-\hat{\mu}(\bar{X}). So it would be evaluated along a line through \bar{X}.
It seems categorical variables must contain 0 as one of the values. This is apparent in
partial_dependence
:import numpy as np from pygam import LinearGAM, s, f X = np.random.standard_normal((100, 3)) X[:,2] = np.random.choice([0,1], 100, replace=True) Y = np.random.standard_normal(100) G = LinearGAM(s(0) + s(1) + f(2)).fit(X, Y) G.partial_dependence(0)
This works fine, but:
X2 = X.copy() X2[:,2] += 2 G2 = LinearGAM(s(0) + s(1) + f(2)).fit(X2, Y) G2.partial_dependence(0)
raises the following:
ValueError: X data is out of domain for categorical feature 2. Expected data on [2.0, 3.0], but found data on [0.0, 0.0]
Issue is that
check_X
looks at categorical of the formed_modelmat
which has 0s everywhere but the term's column. Really,_modelmat
just needs valid values -- the partial dependence just requires the other columns are constant, not necessarily 0. I'd also recommend centering the partial dependence values as it is their shape that is of interest rather than the value...
@jonathan-taylor does it occur to you that G2.partial_dependence(2) is actually working and only G2.partial_dependence(0) and G2.partial_dependence(1) not, which means the issue which caused by X2[:,2] is affecting X2[:,0] and X2[:,1]? How can we explain this?
Yeah, I see — I'm getting this too. The problem emerges, I think, because evaluation of the zeros that get filled in (and I think pyGAM is assuming are the omitted category) for partial_dependence for any other feature are un-evaluable
It seems categorical variables must contain 0 as one of the values. This is apparent in
partial_dependence
:This works fine, but:
raises the following:
ValueError: X data is out of domain for categorical feature 2. Expected data on [2.0, 3.0], but found data on [0.0, 0.0]
Issue is that
check_X
looks at categorical of the formed_modelmat
which has 0s everywhere but the term's column. Really,_modelmat
just needs valid values -- the partial dependence just requires the other columns are constant, not necessarily 0. I'd also recommend centering the partial dependence values as it is their shape that is of interest rather than the value...