HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Groundwork for fusion implementation and assorted bug fixes #18

Closed richardwu closed 5 years ago

richardwu commented 5 years ago

Note: please review the latest commit. The first two commits are outstanding from #15 . rebased

Closes #14, #16 .

Some notable changes:

richardwu commented 5 years ago

Output from hospital dataset on master/HEAD:

...
Precision = 0.92, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.79, Repairing F1 = 0.86, Detected Errors = 437, Total Errors = 509, Correct Repairs = 351, Total Repairs = 380, Total Repairs (Grdth present) = 380
...

Output from hospital dataset with this patch/PR:

Precision = 0.94, Recall = 0.69, Repairing Recall = 0.80, F1 = 0.80, Repairing F1 = 0.87, Detected Errors = 438, Total Errors = 509, Correct Repairs = 351, Total Repairs = 372, Total Repairs (Grdth present) = 372

The fewer false positives result from fixing "consistency" issues with our normalization of values in https://github.com/HoloClean/holoclean/issues/16.

minafarid commented 5 years ago

Looks good, please resolve the conflicts and modify the Usage section in the README.md file to refer to the examples folder

richardwu commented 5 years ago

Rebased and fixed merge conflicts, should be good to merge.