ModelOriented / ingredients

Effects and Importances of Model Ingredients
https://modeloriented.github.io/ingredients/
GNU General Public License v3.0
37 stars 18 forks source link

feature_importance for multiinput models with data as a list of arrays #142

Open jmaspons opened 2 years ago

jmaspons commented 2 years ago

Datasets can be 2d or 3d arrays

hbaniecki commented 2 years ago

Hi @jmaspons, thanks for this contribution. I will try to review it next week.

  1. Could you provide some code examples (use case) for this functionality? Ideally, it would also be later incorporated into the package tests.
  2. Can you tell me how it relates to https://github.com/ModelOriented/ingredients/pull/141?
jmaspons commented 2 years ago

Hello,

You can find a test script with dummy data at https://gist.github.com/jmaspons/0199ef922571bafe5eaac1a056963a83 (it requires keras, abind, DALEX and data.table packages). The patch implements feature_importance for models with more than one input datasets as 2D and 3D arrays. It can be useful for time series data (3D to a RNN) with some static variables (2D). DALEX::explainer doesn't support this kind of data input, so no changes to feature_importance.explainer

141 implements feature_importance for a single input model. I should add some changes to that PR for the variable_groups and the autogenerated variables following this patch which I tested much more cases.

The feature_importance.default and feature_importance.multiinput

In order to add tests, do you think it's acceptable to add all the dependencies or skip some by saving some data in the package?

hbaniecki commented 2 years ago

For starters, we should use underscore notation for function parameters instead of camelCase, e.g. perm_dim.

In order to add tests, do you think it's acceptable to add all the dependencies or skip some by saving some data in the package?

All such dependencies should be added to suggests; we wouldn't want more dependencies in imports. It would be nice to have some tests; they can run on generated data.

pbiecek commented 1 year ago

let's not have data.table and keras as dependencies

jmaspons commented 1 year ago

let's not have data.table and keras as dependencies

I'll find alternatives implementations for the data.table part. For the keras tests, is it ok to make it conditional and add keras in the suggested packages section?

pbiecek commented 1 year ago

we are trying to have DALEX as light as possible and keras is quite heavy package so maybe a valid solution would be to move this function to DALEXtra? (it has some heavy dependencies)