diffix / reference

Reference implementation for the Open Diffix anonymization mechanism.
https://www.open-diffix.org
Other
3 stars 0 forks source link

Recover and redistribute flattened outliers data. #390

Closed cristianberneanu closed 2 years ago

cristianberneanu commented 2 years ago

Data dropped during flattening is recovered and proportionally redistributed over the anonymized buckets, in order to minimize the total distortion per column.

cristianberneanu commented 2 years ago

how do we switch this off?

Didn't implement an off switch yet, as it doesn't seem to make the results worse.

pdobacz commented 2 years ago

how do we switch this off?

Didn't implement an off switch yet, as it doesn't seem to make the results worse.

An off switch would allow us to easily keep validating this. Also the ability to keep redistribution off could be useful when implementing and testing features orthogonal to it - like the datetime support - especially until redistribution is implemented on pg_diffix end.

These are weak arguments, but maybe worth thinking the off switch through as a thought experiment? Is there a way to bypass a single very last step (comment something out), to switch off? I'll try to find this out on the second pass.

cristianberneanu commented 2 years ago

Also the ability to keep redistribution off could be useful when implementing and testing features orthogonal to it - like the datetime support - especially until redistribution is implemented on pg_diffix end.

Fair enough, I will add toggle to switch it off.