HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Use Logistic Regression with co-occur to generate weak labels #43

Closed richardwu closed 5 years ago

richardwu commented 5 years ago

TODO:

Downstream results

    pruning_topk=0.0,
    weak_label_thresh=0.90,
    domain_prune_thresh=0,
    max_domain=100,
    cor_strength=0.0,
    epochs=20,
    weight_decay=0.1,
    threads=20,
    batch_size=32,
    verbose=True,
    timeout=3*60000,
    print_fw=True

Hospital results with Naive Bayes:

INFO:root:Precision = 0.57, Recall = 0.66, Repairing Recall = 0.66, F1 = 0.61, Repairing F1 = 0.61, Detected Errors = 435, Total Errors = 509, Correct Repairs = 336, Total Repairs = 593, Total Repairs (Grdth present) = 593

Hospital results with Logistic Regression:

INFO:root:Precision = 0.99, Recall = 0.79, Repairing Recall = 0.79, F1 = 0.88, Repairing F1 = 0.88, Detected Errors = 435, Total Errors = 509, Correct Repairs = 401, Total Repairs = 406, Total Repairs (Grdth present) = 406