anhaidgroup / py_entitymatching

BSD 3-Clause "New" or "Revised" License
183 stars 48 forks source link

sklearn>=0.22 and Py3.8 compat: sklearn.preprocessing.Imputer has been removed #127

Open mbargull opened 4 years ago

mbargull commented 4 years ago

scikit-learn deprecated the sklearn.preprocessing.Imputer in 0.20.0 and removed it in 0.22.0: https://github.com/scikit-learn/scikit-learn/blame/0.22.1/doc/whats_new/v0.20.rst#L1444-L1468

This means py_entitymatching is incompatible with scikit-learn >=0.22 due to https://github.com/anhaidgroup/py_entitymatching/blob/v0.3.2/py_entitymatching/matcher/matcherutils.py#L11 and https://github.com/anhaidgroup/py_entitymatching/blob/v0.3.2/py_entitymatching/matcher/matcherutils.py#L221-L224

This in turn means py_entitymatching does not fully work with Python 3.8 since scikit-learn <0.22 does not support the latter. ref: https://github.com/conda-forge/py_entitymatching-feedstock/pull/3

The above referenced whats_new/v0.20.rst mentions sklearn.impute.SimpleImputer, sklearn.preprocessing.FunctionTransformer (for the axis keyword), and numpy.nan (for the missing_values keyword) as replacements. I have no experience with scikit-learn so am not able to offer further assistance/PRs, unfortunately.