Closed jnsofini closed 3 months ago
Hi @jnsofini
the attribute variables_
shows the variables that were evaluated during the selection process. If variables=None
when you set up the transformer, then variables_
will be all numerical variables seen during fit(). If variables=[var1, var2,var3]
, then variables_
will also be [var1, var2,var3]
.
If you want to obtain the variables that were selected, you can use support
in combination with feature_names_in_
or get_feature_names_out()
. These 2 are exactly the same as the ones supported in sklearn.
Describe the bug I have a fitted transformer. As per the document, I was expecting to get features that are selected from .variables_ attribute, however, I get all the features returned.
To Reproduce Steps to reproduce the behavior: import pandas as pd from sklearn.datasets import make_classification from feature_engine.selection import SmartCorrelatedSelection
make dataframe with some correlated variables
def make_data(): X, y = make_classification(n_samples=1000, n_features=12, n_redundant=4, n_clusters_per_class=1, weights=[0.50], class_sep=2, random_state=1)
X1 = make_data()
set up the selector
tr2 = SmartCorrelatedSelection( variables=None, method="pearson", threshold=0.8, missing_values="raise", selection_method="variance", estimator=None, )
Xt = tr2.fit_transform(X1) tr2.features_todrop ['var_0', 'var_4', 'var_6', 'var_9']
Expected behavior tr2.variables_ should give ['var_1', 'var_10', 'var_11', 'var_2', 'var_3', 'var_5', 'var_7', 'var_8']
instead I get ['var_0', 'var_1', 'var_2', 'var_3', 'var_4', 'var_5', 'var_6', 'var_7', 'var_8', 'var_9', 'var_10', 'var_11']