HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Fixed get_infer_data to always return DK cells when infer_labeled=False. #56

Closed richardwu closed 5 years ago

richardwu commented 5 years ago

Also renamed parameter names for better consistency and added estimator_enabled parameter which disables the weak labelling pathway.

Changes to get_infer_data

When infer_labeled = True, fixing_domain_gen performed inference on ALL cells (clean and DK). This meant that un-detected errors (clean cells that had errors) were also being repaired. This overestimated the # of correct/total repair and thus it is not recommend to set infer_labeled = True when performing experiments.

When infer_labeled = False, fixing_domain_gen only inferred on non-weak labelled cells. If during weak labelling a DK cell was weak labelled, it would not be inferred. This is fixed in https://github.com/HoloClean/holoclean/pull/56 so all DK cells are inferred regardless of their labelling.