anhaidgroup / py_entitymatching

BSD 3-Clause "New" or "Revised" License
184 stars 48 forks source link

Enhan feature selection #105

Closed ran-tan closed 6 years ago

ran-tan commented 6 years ago

Adding rescaling/normalizing and feature selection support.

Files created

  1. rescaling module: py_entitymatching/feature/scalers.py test: py_entitymatching/tests/test_feature_scalers.py api: docs/user_manual/api/rescaling_feature_vectors.rst manual: docs/user_manual/rescaling_feature_vectors.rst notebook: notebooks/guides/step_wise_em_guides/Rescaling Features.ipynb

  2. feature selection module: py_entitymatching/feature/selectfeatures.py test: py_entitymatching/tests/test_feature_selectfeatures.py api: docs/user_manual/api/selecting_features.rst manual: docs/user_manual/selecting_features.rst notebook: notebooks/guides/step_wise_em_guides/Selecting Features Univariate.ipynb notebooks/guides/end_to_end_em_guides/Basic EM Workflow Restaurants - Feature Selection

Files modified

  1. py_entitymatching/feature/attributeutils.py add function get_attrs_to_project py_entitymatching/feature/autofeaturegen.py modify get_features to union of similarity metrics of left table data type and right table data type
ran-tan commented 6 years ago

Detected an error in scalers.py. Will close it and make another pull request.