TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.74k stars 332 forks source link

LightGBM Category error #351

Open gaosq0604 opened 4 years ago

gaosq0604 commented 4 years ago

When previous model is lightgbm, using sklearn API with LGBMRegressor or LGBMClassifier with several category columns, then directly input perm = PermutationImportance(model, random_state = 42).fit(X_test, y_test) will cause error

ValueError: train and valid dataset categorical_feature do not match.

only if first astype int for all categorical columns and then write categorical_feature = ... in fit function, you can continue calculating. Should be fixed THX

gaosq0604 commented 4 years ago

anyone...???

cdrouin commented 4 years ago

Hitting a similar issue here - I recently switched from one-hot encoding to proper Pandas categorical variables and now I can't use show_prediction() to introspect on specific predictions.

kfoofw commented 2 years ago

@cdrouin , I recently came across this problem too using Pandas categorical, and I happen to find a possible solution based on this thread. When you call on explain_prediction and specify the row of the PD Dataframe, you need to do double square brackets (ie df.iloc[[0]] and not df.iloc[0]).

https://github.com/TeamHG-Memex/eli5/issues/214#issuecomment-453872558

I had the same problem and above did not solve it for me. What worked for me was using `X.iloc[[1]]` instead of `X.iloc[1]`.  The latter form automatically converts a row to a series, which converts the datatypes in the row to "object" type if they differ.

_Originally posted by @agnesvanbelle in https://github.com/TeamHG-Memex/eli5/issues/214#issuecomment-453872558_