arx-deidentifier / arx

ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-identification risks and it supports well-known privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy.
http://arx.deidentifier.org/
Apache License 2.0
620 stars 214 forks source link

[BUG] ARX refuses to load a project containing a possibly empty string value as QID without hierarchy #460

Open jenno-verdonck opened 6 months ago

jenno-verdonck commented 6 months ago

Describe the bug In the GUI I have a dataset where a certain Quasi-identifier can also have an empty string as value. I didn't set a hierarchy yet as I didn't want to generalize this value while still taking it into account for k-anonymity. I can perfectly anonymize the data with these settings. I saved the project file and later reloaded it. Suddenly I can't load the file and get the following error: hierarchy does not contain a transformation rule for value ''. After running the source in debug I could already identify 1 difference. It seems that the default empty hierarchy (only containing the unique values without extra levels) that is given to the anonymizer does contain the empty string as a possibility while the hierarchy that is loaded from the project file does not contain this empty string. The code then crashes on the // Register at the dictionary and encode line in the DataManager constructor.

To Reproduce Steps to reproduce the behavior:

  1. Create an anonymization setup where the input data has a Quasi-identifier with empty values.
  2. Do not create a hierarchy for this Quasi-identifier.
  3. Anonymize the data.
  4. Save the project.
  5. Try to load the project from the file
  6. See the given error

Expected behavior It should be possible to load/save this type of project.

ARX GUI (please complete the following information):

HaRRy-19 commented 1 week ago

I am still facing this same issue. It doesn't load the complete project if we cannot form a hierarchy since it contains only one value pertaining to that hierarchy. This means that I cannot perform any operation with this tool unless I create another new project from the scratch. Again if you close the tool, and reopen it boils down to the same issue. This workaround is cumbersome. Can this be fixed?

prasser commented 1 week ago

Thanks for reporting this again and sorry that it is not fixed, yet. Just to be sure: in your case it is also a quasi-identifier that contains an empty string as value that causes this?

HaRRy-19 commented 1 week ago

Thank you for asking. In my case, if I don't create a hierarchy for a certain row(since it has only one and the same value), I am encountering this issue after I exit the application and open again.

prasser commented 1 week ago

Ok. Are you able to share the file or create a minimal example that can be used to reproduce the issue? If there is no empty value in your case, the issue is likely another issue.

HaRRy-19 commented 1 week ago

Sorry in my case as well, it is due to the empty values for certain number of rows.