bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
112 stars 18 forks source link

Fastshap with tree augmented naive bayes of caret (tan) #44

Open PARODBE opened 2 years ago

PARODBE commented 2 years ago

Hi,

Is it possible to use this library for tan model? In my tan model features have different categories (they are strings).

Thanks!

bgreenwell commented 2 years ago

Hi @PARODBE, can you provide some info as to what a “tan” mode is and what R package is used to fit them?

PARODBE commented 2 years ago

Of course! https://github.com/topepo/caret/blob/master/models/files/tan.R

This model can be train with categories which can work like strings (you use this model for do searchs of conditional probabilities). You can get whatever dataset before of dummies conversion and run fastshap like a proof, in my case doesn't work, but it is very possible that I do something wrong since I am very better in python.

Thank you!

bgreenwell commented 2 years ago

Gotcha @PARODBE, and thanks for the link. If you’d be kind enough to post a small reproducible example, I’d be happy to take a look!!

PARODBE commented 2 years ago

Hi again,

Use whatever dataset with categorical data (with strings, for example 1 variable, kind of animals: dogs, birds,cats..., another variable, size: High, medium, little etc and output variable, for example: cute, not cute), and after you can build your model with this (it is only an example):

set.seed(666)

fitControl <- trainControl(method = "repeatedcv", number=5, repeats=50, classProbs = TRUE, summaryFunction = twoClassSummary, verbose=F)

tune.grid <- expand.grid(smooth=10^seq(-1,2,0.2), score=c('bic', 'aic'))

alldata.tan <- caret::train(x,y, method = "tan", trControl = fitControl, tuneGrid = tune.grid, metric = "ROC", maximize=TRUE ) I am doing a hyperparameter tunning based on smooth and score type.

And here you have your tan model for introduce it in fastshap, I am doing this:

p_function_G<- function(object, newdata) caret::predict.train(alldata.tan, newdata = x, type = "prob")[,"Positive"] # select G class

shap_values_G <- fastshap::explain(alldata.tan, X = x, pred_wrapper = p_function_G, nsim = 2,

select examples corresponding to category G from

              # the trainset used for building the model (not shown)
                               adjust=FALSE)

But I obtain nothing, for this reason I think that I am doing something wrong...

PARODBE commented 2 years ago

Have you could find out something about this???

bgreenwell commented 2 years ago

Hi @PARODBE, I have not found the time yet, but if you have a reprex I could run on my end, it would make it a lot easier to narrow down the issue and help solve your problem.

PARODBE commented 2 years ago

Hi!

I've create a section in my github for this with a dataset totally artificial but it should be useful for using fastshap: https://github.com/PARODBE/bnlearn_r_playing/tree/main

All best, Pablo