My graph learner produced by mlr3pipeline was like this figure (firstly feature selection by cv_glment and then compare and choose the best machine learning method) and could successfully work to produce the output, serving xgboost as the final method.
However, when I try to calculate feature importance, it take so much time as all features were taken into calculation and I could not
get the final result in a few hours.
"at_lrn" is my graph learner, "MOX_std_data" was my data
model <- Predictor$new(at_lrn, data = MOX_std_data[split$test,], y = "phenotype")effect <- FeatureImp$new(model,loss = "ce",n.repetitions = 10)
Previously I try to get selected feature from cv_glment method by the follow codes and to reduce the data size.
selected_lrn <- at_lrn$tuning_result[,1]$branch.selectionremain_features <- at_lrn$learner$graph_model$pipeops[[selected_lrn]]$learner_model$model$feature_names
However, after I reduce the data size,
model <- Predictor$new(at_lrn, data = MOX_std_data[split$test,c(remain_features,"phenotype")], y = "phenotype")effect <- FeatureImp$new(model,loss = "ce",n.repetitions = 10)
The error """Assertion on 'newdata$colnames' failed: Must include the elements xxxxxxxxxxxx""" will occur.
One of way successfully solve this problem is to separate cv_glment from my graph learner and select feature firstly to reduce the size of task, and then used the reduced task to train my graph learner .
But I wonder if there is any other solution which need not to separate cv_glment from my graph learner and directly calculate selected feature's importance? As I think a complete graph pipeline learner was much more elegant.
My graph learner produced by mlr3pipeline was like this figure (firstly feature selection by cv_glment and then compare and choose the best machine learning method) and could successfully work to produce the output, serving xgboost as the final method.
However, when I try to calculate feature importance, it take so much time as all features were taken into calculation and I could not get the final result in a few hours. "at_lrn" is my graph learner, "MOX_std_data" was my data
model <- Predictor$new(at_lrn, data = MOX_std_data[split$test,], y = "phenotype")
effect <- FeatureImp$new(model,loss = "ce",n.repetitions = 10)
Previously I try to get selected feature from cv_glment method by the follow codes and to reduce the data size.
selected_lrn <- at_lrn$tuning_result[,1]$branch.selection
remain_features <- at_lrn$learner$graph_model$pipeops[[selected_lrn]]$learner_model$model$feature_names
However, after I reduce the data size,
model <- Predictor$new(at_lrn, data = MOX_std_data[split$test,c(remain_features,"phenotype")], y = "phenotype")
effect <- FeatureImp$new(model,loss = "ce",n.repetitions = 10)
The error """Assertion on 'newdata$colnames' failed: Must include the elements xxxxxxxxxxxx""" will occur.
One of way successfully solve this problem is to separate cv_glment from my graph learner and select feature firstly to reduce the size of task, and then used the reduced task to train my graph learner .
But I wonder if there is any other solution which need not to separate cv_glment from my graph learner and directly calculate selected feature's importance? As I think a complete graph pipeline learner was much more elegant.