giuseppec / iml

iml: interpretable machine learning R package
https://giuseppec.github.io/iml/
Other
492 stars 87 forks source link

How to reduce calculation time of of feature importance by "iml" in a graph pipline provided by mlr3pipeline? #196

Closed HongxiangXu closed 2 years ago

HongxiangXu commented 2 years ago

My graph learner produced by mlr3pipeline was like this figure (firstly feature selection by cv_glment and then compare and choose the best machine learning method) and could successfully work to produce the output, serving xgboost as the final method. image

However, when I try to calculate feature importance, it take so much time as all features were taken into calculation and I could not get the final result in a few hours. "at_lrn" is my graph learner, "MOX_std_data" was my data model <- Predictor$new(at_lrn, data = MOX_std_data[split$test,], y = "phenotype") effect <- FeatureImp$new(model,loss = "ce",n.repetitions = 10)

Previously I try to get selected feature from cv_glment method by the follow codes and to reduce the data size. selected_lrn <- at_lrn$tuning_result[,1]$branch.selection remain_features <- at_lrn$learner$graph_model$pipeops[[selected_lrn]]$learner_model$model$feature_names

However, after I reduce the data size, model <- Predictor$new(at_lrn, data = MOX_std_data[split$test,c(remain_features,"phenotype")], y = "phenotype") effect <- FeatureImp$new(model,loss = "ce",n.repetitions = 10)

The error """Assertion on 'newdata$colnames' failed: Must include the elements xxxxxxxxxxxx""" will occur.

One of way successfully solve this problem is to separate cv_glment from my graph learner and select feature firstly to reduce the size of task, and then used the reduced task to train my graph learner .

But I wonder if there is any other solution which need not to separate cv_glment from my graph learner and directly calculate selected feature's importance? As I think a complete graph pipeline learner was much more elegant.