We remove NULLs from the domain. Therefore if a cell is initially NULL we always predict a non-NULL value unless we cannot generate a non-trivial domain based on co-occurring values from correlated attributes.
We maintain our previous precision and recall as desired (with settings in holoclean_repair_example.py)
23:01:27 - [ INFO] - Precision = 1.00, Recall = 0.46, Repairing Recall = 0.53, F1 = 0.63, Repairing F1 = 0.70, Detected Errors = 435, Total Errors = 509, Correct Repairs = 232, Total Repairs = 459, Total Repairs on correct cells (Grdth present) = 0, Total Repairs on incorrect cells (Grdth present) = 232
Same settings as above but without InitAttrFeaturizer:
23:12:57 - [ INFO] - Precision = 0.95, Recall = 0.85, Repairing Recall = 1.00, F1 = 0.90, Repairing F1 = 0.97, Detected Errors = 435, Total Errors = 509, Correct Repairs = 434, Total Repairs = 683, Total Repairs on correct cells (Grdth present) = 22, Total Repairs on incorrect cells (Grdth present) = 434
We remove NULLs from the domain. Therefore if a cell is initially NULL we always predict a non-NULL value unless we cannot generate a non-trivial domain based on co-occurring values from correlated attributes.
We maintain our previous precision and recall as desired (with settings in
holoclean_repair_example.py
)Same settings as above but without
InitAttrFeaturizer
: