NorskRegnesentral / shapr

Explaining the output of machine learning models with more accurately estimated Shapley values
https://norskregnesentral.github.io/shapr/
Other
138 stars 32 forks source link

Dependency-aware approaches with one-hot encoded data #388

Closed aliamini-uq closed 1 month ago

aliamini-uq commented 2 months ago

Dear @LHBO,

Many thanks for adding vaeac approach to the shapr package. According to issue #385:

In the developer version of shapr here at this GitHub repository, both the independence and vaeac methods support categorical data. The former handles the levels directly, while vaeac will one-hot-encode the categorical features internally to support categorical data. Thus, the user does not have to pre-process the data before sending them to explain() ...

An end-user can benefit from vaeac,independence, ctree dependency modeling approaches without applying one-hot encoding. However, If I have a dataset, for example, Abalone, where the categorical feature is previously encoded, can I use vaeac,independence, and ctree approaches in this setting? Last but not least, I really appreciate your priceless time in advance.

Kind regards, A

aliamini-uq commented 2 months ago

Dear @LHBO,

I will be honored if you take a look at my problem.

Kind regards, A

martinju commented 2 months ago

Hi

You can use them, but you will get one shapley value per level in your categorical data, which does not really make sense, and will also increase the computational complexity unnecessarily . Thus, you should rather transform your data back to the original setup without one-hot-encoding and pass that to shapr using a prediction function which takes the original data as input.

Hope this helps.

aliamini-uq commented 1 month ago

Many thanks for your help.