diffix / syndiffix

Python implementation of the SynDiffix synthetic data generation mechanism.
Other
4 stars 1 forks source link

Do stitching instead of patching for targeted clusters #138

Closed yoid2000 closed 1 month ago

yoid2000 commented 1 month ago

In some experiments, I found that always stitching using the target column, instead of patching, improves accuracy.

Make stitching the normal behavior.

yoid2000 commented 1 month ago

The change needs to come at line 255 in solver.py:

https://github.com/diffix/syndiffix/blob/4dcdbfa375d7c8bb707cf17335a8741845a60c80/syndiffix/clustering/solver.py#L251-L257

The second element, the empty list, should contain the target column

Also, the owner should be LEFT, so that a large number of low-correlation columns don't reduce the accuracy of the target column.