arx-deidentifier / arx

ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-identification risks and it supports well-known privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy.
http://arx.deidentifier.org/
Apache License 2.0
620 stars 213 forks source link

[BUG] anonymize silently fails when passing hierarchies missing values #283

Closed sonhal closed 5 years ago

sonhal commented 5 years ago

Describe the bug When s using the ARX API to anonymize a dataset using a generalization hierarchy that is missing a generalization row for the first row in the dataset there is no exception thrown, the anonymization just fails.

To Reproduce Dataset

   [age, gender, zipcode]
   [34, male, 81675]
   [45, female, 81667]
   [66, male, 81925]
   [70, female, 81931]
   [34, female, 81931]
   [70, male, 81931]
   [45, male, 81931]

Age hierarchy:

   [34, <50, *]
   [45, <50, *]
   [66, >=50, *]
   [70, >=50, *]

Gender hierarchy:

   [male, *]
   [female, *]

Zipcode hierarchy (missing row for first row in dataset)

   [81667, 8166*, 816**, 81***, 8****, *****]
   [81925, 8192*, 819**, 81***, 8****, *****]
   [81931, 8193*, 819**, 81***, 8****, *****]

Expected behavior java.lang.IllegalArgumentException: Attribute 'zipcode': hierarchy misses some values or contains duplicates

ARX GUI (please complete the following information):

prasser commented 5 years ago

Thanks for your interest in ARX!

This bug has been fixed in master already some time ago. It would be great, if you could test and verify this.

If you realized this while working on "ARXaaS - Anonymization as a Service", I would generally suggest that you should consider using the version of ARX from our master branch. The current development cycle has been very long and we have just released an alpha version of ARX 3.8.0, which basically equals the current master. This version brings a lot of performance improvements, bugfixes and new functionalities...

sonhal commented 5 years ago

Thank you for the quick response Prasser. I will definitely check out the alpha version. Is there any ETA on the full release?

prasser commented 5 years ago

There is no clear ETA. However, we will not add any new features, just bugfixes until everything has calmed down. The release will probably be in about 4 weeks.