ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation
https://dalex.drwhy.ai
GNU General Public License v3.0
1.35k stars 165 forks source link

Xgboost model. Predict function returns an error. #214

Closed zlotopolscylover closed 4 years ago

zlotopolscylover commented 4 years ago

We have a problem with creation of an explainer with package dalex. We are dealing with uplift modeling. We calculate uplift based on function calc_uplift_filled, which is based on xgboost model. Our xgboost model has been trained on np.array and we would like to keep it that way. As you can see on the attached screenshot predict function returns an error. Can you advise us some workaround?

issue_dalex
hbaniecki commented 4 years ago

Hi, What is the output of:

calc_uplift_filled(r_xgb_model, X_train)

Also, can you try converting x into desired numpy/xgboost.DMatrix type in calc_uplift_filled? X_train will be converted to pd.DataFrame after passing it to Explainer.

hbaniecki commented 4 years ago

TODO: inform the user that data is converted to pd.DataFrame https://github.com/ModelOriented/DALEX/blob/ce0839fb88c77308832098be5aff4fbf3aa775a5/python/dalex/dalex/_explainer/checks.py#L30

zlotopolscylover commented 4 years ago

Hi, What is the output of:

calc_uplift_filled(r_xgb_model, X_train)

Also, can you try converting x into desired numpy/xgboost.DMatrix type in calc_uplift_filled? X_train will be converted to pd.DataFrame after passing it to Explainer.

We followed the recommendation and did a mapping to np.array in calc_uplift_filled and the explainer worked - thanks, but now there is a new problem. We have an error in the model_profiles function.

issue_dalex_2
hbaniecki commented 4 years ago

From this screenshot, I can't even see where the problem occurred (in dalex). Can you provide a reproducible example that I can run?

zlotopolscylover commented 4 years ago

We are still struggling with running dalex explainations on xgboost model trained on np.array. It seems that the problem is in array conversion to dataframe inside dalex. As you know the package better we hope you can tell as what exactly we should change.

This is a link to our repository - we would like to run the pdp file. https://github.com/ludziej/IML-historical-marketing-campaign

hbaniecki commented 4 years ago

Reproducible, meaning that I can run it, preferably in the form of jupyter-notebook; maybe on a smaller scale. python pdp.py yields some HDF5 library errors that I won't be able to resolve.

zlotopolscylover commented 4 years ago

Here is the link to google colab jupyter-notebook with reproducible example.

https://colab.research.google.com/drive/1krREix2XdCNJzowUjYiEKG1eX_9nf126#scrollTo=N2VzyR0FKzls

hbaniecki commented 4 years ago

@zlotopolscylover Thanks for this catch! Your example works on the soon-to-be-merged dalex version.