alexzwanenburg / familiar

Repository for the familiar R-package. Familiar implements an end-to-end pipeline for interpretable machine learning of tabular data.
European Union Public License 1.2
30 stars 3 forks source link

Returning features that are collinear with selected features in lasso regression #76

Open austinhpatton opened 10 months ago

austinhpatton commented 10 months ago

When running the familiar pipeline using the lasso learner and lasso_binomial feature selection method, features that are collinear with selected features are excluded from the resultant summaries of variable importance and other related summaries.

Is there a way to, using the pipeline outputs, identify which features were excluded from the full analysis but were collinear with selected features? The aim here would be to get a more exhaustive list of features that are strong predictors of the response variable.

Thanks in advance!

alexzwanenburg commented 10 months ago

I don't think the information is currently exposed directly, but it should be stored.

Just from memory the information is stored with the model in the feature_info attribute. That attribute contains a list of FeatureInfo objects (one per selected feature), each of which has a cluster_parameters attribute. That attribute contains a featureInfoParametersCluster object, which has the cluster_features attribute. That attribute describes the features are colinear and form a cluster.

I need to: