ModelOriented / modelStudio

📍 Interactive Studio for Explanatory Model Analysis
https://doi.org/10.1007/s10618-023-00924-w
GNU General Public License v3.0
326 stars 32 forks source link

Error in check_explainer(explainer): For Python support, use precalculate=True in Explainer init #113

Open vsteiger opened 1 year ago

vsteiger commented 1 year ago

Following the example from https://modelstudio.drwhy.ai/articles/ms-r-python-examples.html,

We ran a model in python using scikit-learn built an explainer file via dalex and exported it as a pickle file.

This pickle files was loaded in R using reticulate (py_load_object) explainer <- py_load_object("explainer_scikitlearn_precalculateTrue.pickle", pickle = "pickle")

During building the explainer file we used the flag: precalculate=True

After using modelStudio(explainer, B = 5), we got this error message:

Error in check_explainer(explainer) : For Python support, use precalculate=True in Explainer init

Do you have any idea how to solve this?


R: R-Version 4.1.2 RStudio 2022.07.02 reticulate_1.26 // DALEX: 2.4.2 // DALEXtra: 2.2.1

Python: Python-Version: 3.9.15 via virtualenv DALEX 1.5.0

hbaniecki commented 1 year ago

Hi @vsteiger,

  1. Does the full example code work for you?
  2. For your use-case: a) in Python: what is the verbose output of an Explainer? image b) in Python: can you access explainer.y_hat and explainer.residuals? c) in R: can you access explainer$y_hat and explainer$residuals? https://github.com/ModelOriented/modelStudio/blob/91da18c728bf095aa0b59ac58d8be1de7a6ff7c6/R/modelStudio.R#L655-L659
vsteiger commented 1 year ago

Hi @hbaniecki

We found the error in the explainer building step in python. We've corrected this and now the verbose output in Python looks clean.

We continued in R:

explainer1 <- py_load_object("explainer_scikitlearn_precalculateTrue_v02.pickle", pickle = "pickle")

class(explainer1)

"explainer" "explainer" "explainer" "dalex._explainer.object.Explainer" "python.builtin.object"

In R we can access both explainer.y_hat and explainer.residuals and they are not NULL:

is.null(explainer1$residuals)
[1] FALSE
> is.null(explainer1$y_hat)
[1] FALSE

Now, when running modelStudio(explainer1, B = 5) on the explainer file, we got this error message:

new_observation argument is NULL. new_observation_n observations needed to calculate local explanations are taken from the data.

Error in modelStudio.explainer(explainer1, B = 5) :

explainer$predict_function returns an error when executed on new_observation[1,, drop = FALSE]

hbaniecki commented 1 year ago

@vsteiger great!

what do you get from explainer$data[1, , drop=FALSE] ?

explainer$predict_function(explainer$model, explainer$data)
explainer$predict_function(explainer$model, explainer$data[1, ])
explainer$predict_function(explainer$model, explainer$data[1, , drop=FALSE])

Just for context, I ran the example from documentation https://modelstudio.drwhy.ai/articles/ms-r-python-examples.html and it runs OK for me with the same software versions.

So maybe to solve this I would need to have exemplary data/model made by you where the error occurs.

vsteiger commented 1 year ago

Hi @hbaniecki

explainer1$data[1, , drop=FALSE] runs smoothly and prints the first subject in the sample

For:

explainer1$predict_function(explainer1$model, explainer1$data) explainer1$predict_function(explainer1$model, explainer1$data[1, ]) explainer1$predict_function(explainer1$model, explainer1$data[1, , drop=FALSE])

We get an error, that we have encountered in Python originally but solved it via formatting:

Error in py_call_impl(callable, dots$args, dots$keywords) : IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Regarding the example data due to company restrictions we are not allowed to share any of the data itself

hbaniecki commented 1 year ago

Yes, but it could be synthetically generated data from numpy working on a changed model in sklearn, just to reproduce the error.