danilkolikov / fsfc

Feature Selection for Clustering
MIT License
91 stars 28 forks source link

Require to install sklearn 22.1 and anyway dose not works #3

Open frankbass3 opened 3 years ago

frankbass3 commented 3 years ago

I was trying to use your code but unfortunately i find some problems


ModuleNotFoundError Traceback (most recent call last)

in ----> 1 from sklearn.feature_selection.base import SelectorMixin ModuleNotFoundError: No module named 'sklearn.feature_selection.base' for resolve that i have installed scikit-learn==22.1 but anyway ImportError Traceback (most recent call last) in ----> 1 from sklearn.feature_selection.base import SelectorMixin ~/anaconda3/envs/python3/lib/python3.6/site-packages/sklearn/feature_selection/base.py in 4 from . import _base 5 from ..externals._pep562 import Pep562 ----> 6 from ..utils.deprecation import _raise_dep_warning_if_not_pytest 7 8 deprecated_path = 'sklearn.feature_selection.base' ImportError: cannot import name '_raise_dep_warning_if_not_pytest can you update the code to run to the latest and without error ?
mglowacki100 commented 3 years ago

@frankbass3 I'm not the author of library but I think I've fix it for sklearn 0.24.1, namely in file fsfc/base.py you need to replace: from sklearn.feature_selection.base import SelectorMixin with from sklearn.feature_selection._base import SelectorMixin and run python setup.py install.

This toy-example works for me:

import seaborn as sns

from fsfc.generic import NormalizedCut
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.cluster import KMeans
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer

df = sns.load_dataset('iris')

categorical_features = ['species']
categorical_transformer = OneHotEncoder(handle_unknown='ignore')

numeric_features = df.select_dtypes(include=['float64']).columns
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('select', NormalizedCut(3)),
    ('cluster', KMeans())
])
pipeline.fit(df)