anhaidgroup / py_entitymatching

BSD 3-Clause "New" or "Revised" License
183 stars 48 forks source link

Fix imputer #130

Open christiemj09 opened 4 years ago

christiemj09 commented 4 years ago

sklearn.preprocessing.Imputer was deprecated and is now no longer available in the most recent versions of scikit-learn. This PR replaces it with its closest equivalent sklearn.impute.SimpleImputer. Updating to SimpleImputer is required for supporting Python 3.8 in the future; see #127.

christiemj09 commented 4 years ago

Failed runs in Travis are addressed in #129.

A design note: Why not pass in an instance ofsklearn.impute.*Imputer to the function py_entitymatching.matcher.matcherutils.impute_table()? Right now we hard-code a limited subset of keyword arguments that we forward to our own hand-made instance.

trevor-laity commented 3 years ago

Hey there @christiemj09, are there any plans to merge this in or to update the library to use the current verison of scikit-learn? Running into an issue when using poetry with this library where poetry's dependency management doesn't play nicely with C libraries or extensions like Cython, which this version of sci-kit learn relies on.

christiemj09 commented 3 years ago

Hey @trevor-laity: Apologies, this is a dangling pull request; see #147. These changes were released in 0.4.0, the most recent version of py-entitymatching. I'll keep this PR open for now to make sure your issue gets sorted out, though it should be closed after.

Installing most recent version of py-entitymatching that depends on scikit-learn>=0.22:

pip install --no-cache-dir py-entitymatching==0.4.0