If a feature has percentiles that vary by less than 0.01 the generated grid has duplicate values which (for some reason) leads to unequal dimensions of the .feature_grids and .pdp attributes of the pdpbox.pdp.pdp_isolate_obj.
When using this development version, the problem with the dimensions is gone, but the grid will have only one value in this example. I could fix the problem for my purposes by just removing the rounding statements in the _get_grids() function (see my fork), but assume there's a reason for the rounding so a real fix is probably more involved (?).
Here's a reproducible example (using the version from pypi):
import pandas as pd
import numpy as np
from pdpbox import pdp # Version from pypi
from sklearn.linear_model import SGDClassifier
np.random.seed(123)
df = pd.DataFrame({'y': np.random.randint(0,2,100),
'x1': np.random.uniform(0.5, 0.5001, 100),
'x2': np.random.uniform(0.5, 0.5001, 100)})
clf = SGDClassifier()
X = df[['x1', 'x2']]
y = df['y']
clf.fit(X, y)
P = pdp.pdp_isolate(model=clf, train_X=X, feature='x1')
print(P.feature_grids)
print(P.pdp)
If a feature has percentiles that vary by less than 0.01 the generated grid has duplicate values which (for some reason) leads to unequal dimensions of the
.feature_grids
and.pdp
attributes of thepdpbox.pdp.pdp_isolate_obj
. When using this development version, the problem with the dimensions is gone, but the grid will have only one value in this example. I could fix the problem for my purposes by just removing the rounding statements in the_get_grids()
function (see my fork), but assume there's a reason for the rounding so a real fix is probably more involved (?).Here's a reproducible example (using the version from pypi):