anhaidgroup / py_entitymatching

BSD 3-Clause "New" or "Revised" License
184 stars 48 forks source link

Enhan feature selection #107

Closed ran-tan closed 6 years ago

ran-tan commented 6 years ago

Adding rescaling/normalizing and feature selection support.

Files created

rescaling
module: py_entitymatching/feature/scalers.py
test: py_entitymatching/tests/test_feature_scalers.py
api: docs/user_manual/api/rescaling_feature_vectors.rst
manual: docs/user_manual/rescaling_feature_vectors.rst
notebook: notebooks/guides/step_wise_em_guides/Rescaling Features.ipynb

feature selection
module: py_entitymatching/feature/selectfeatures.py
test: py_entitymatching/tests/test_feature_selectfeatures.py
api: docs/user_manual/api/selecting_features.rst
manual: docs/user_manual/selecting_features.rst
notebook: notebooks/guides/step_wise_em_guides/Selecting Features Univariate.ipynb
notebooks/guides/end_to_end_em_guides/Basic EM Workflow Restaurants - Feature Selection

Files modified

  1. py_entitymatching/feature/attributeutils.py add function get_attrs_to_project py_entitymatching/feature/autofeaturegen.py modify get_features to union of similarity metrics of left table data type and right table data type

Didn't pass documentation build tests. Need further examination from the maintainers.

anhaidgroup commented 6 years ago

@ran-tan Thanks for the pull request. It could be a good addition to py_entitymatching. But may I know why did you close it.