Closed neuideas closed 2 years ago
Could you please elaborate on what exactly you struggle with?
My question is, am I doing it right to perform this feature importance on the test data? I used the test data in the explainer, so the features should be selected based on the instances in test set.
My second question (which is a confusion), if this feature importance is specific for each model (RF and SVM see different important features), why we call it "model-agnostic" method? I believe model-agnostic is not model-dependent.
Hi! We call it model-agnostic because the method can be applied to various model algorithms like RF or SVM, i.e. it does not assume any structure of the explained model, as opposed to model-specific methods suited for either tree-ensembles or neural networks.
As for the choice of the dataset, it really depends if you want to obtain feature importance based on train or test data, as described in the article you mentioned.
As for the choice of the dataset, it really depends if you want to obtain feature importance based on train or test data, as described in the article you mentioned.
But because the features selections are performed based on the model's performance (RMSE value for ex), it would be better if we do it using test data, right?
On Thu, May 26, 2022 at 2:24 PM Hubert Baniecki @.***> wrote:
Hi! We call it model-agnostic because the method can be applied to various model algorithms like RF or SVM, i.e. it does not assume any structure of the explained model, as opposed to model-specific methods suited for either tree-ensembles or neural networks.
As for the choice of the dataset, it really depends if you want to obtain feature importance based on train or test data, as described in the article you mentioned.
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/511#issuecomment-1138522575, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUFY66HZAJ7D7X5WEHUQ4ADVL5UPVANCNFSM5XA42NSQ . You are receiving this because you authored the thread.Message ID: @.***>
I would say it depends if u want to measure the metrics on out-of-bag observations, for example, one can treat measuring importance the same as measuring model error. To some extent, we really shouldn't rely on the names of "train" or "test" data. Instead, we should consider what are the variables' distributions in the validate set, which we use to compute explanations.
According to this article (https://christophm.github.io/interpretable-ml-book/feature-importance.html)
Permutation based features importance should be performed with test data, so am I doing right the following
learner= lrn("regr.randomForest") model= learner$train(task )
explainer = explain_mlr3(model, data = test[,-22], y = as.numeric(test$bug)-1, label="RF") p1= model_parts(explainer = explainer)