arx-deidentifier / arx

ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-identification risks and it supports well-known privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy.
http://arx.deidentifier.org/
Apache License 2.0
620 stars 213 forks source link

Re-using lattice #229

Closed arc7an closed 5 years ago

arc7an commented 5 years ago

Is it possible to re-use the lattices that were calculated for the one dataset to another dataset?

So imagine we have 2 datasets (like training and testing). We set up all hierarchies and anonymization strategies for training dataset and make an anonymization. After that we extract the lattice and apply it to the testing dataset that have the same structure as the training one.

The reason to do that is to probably increase the performance in case there is a huge dataset to be processed, however the user already know how to anonymize it, and the user is sure that e.g. k-anonymity requirements will be satisfied.

Thank you in advance!

prasser commented 5 years ago

Thanks for your interest in ARX!

A (generalization) lattice is a structure used to classify anonymization strategies for a specific dataset along axes such as data utility and privacy. As such, it can not be "applied to" or "re-used for" other datasets. What you can do of course, is to use the same anonymization strategy, e.g. generalization levels.

To do so, you can use the same generalization hierarchies with fixed generalization levels. However, in the GUI you need to set up a new project, when processing a new table.

Please send future questions on ARX's usage to arx.deidentifier@gmail.com

Best Fabian