arx-deidentifier / arx

ARX is a comprehensive open source data anonymization tool aiming to provide scalability and usability. It supports various anonymization techniques, methods for analyzing data quality and re-identification risks and it supports well-known privacy models, such as k-anonymity, l-diversity, t-closeness and differential privacy.
http://arx.deidentifier.org/
Apache License 2.0
620 stars 213 forks source link

How to use local recoding? #83

Closed luyang1210 closed 7 years ago

luyang1210 commented 7 years ago

Dear Fabian, Thank you for kindly answering. I also have questions towards using the "local recoding" in ARX. I find it is a little tricky to control the "strength" of generlization by since different jargons are in used (such as the fixed point, 100 pass etc.). Is there any documentations to download? Cheers Yang

luyang1210 commented 7 years ago

Dear Fabian,

Thank you for kindly answering. I also have questions towards using the "local recoding" in ARX. I find it is a little tricky to control the "strength" of generlization by since different jargons are in used (such as the fixed point, 100 pass etc.). Is there any documentations to download?

Cheers Yang

prasser commented 7 years ago

Dear Yang,

unfortunately this feature is not very well documented at the moment and the implementation is a little over-complicated. We will replace the interface for local recoding with a more intuitive interface in the future.

For now, I recommend the following process/parameterization:

I hope this helps! Fabian

luyang1210 commented 7 years ago

Dear Fabian,

Thank you for the kind reply. Regarding the second point, I didnt find the tab called ''coding model''. By the way I am using the 3.4.2. Is it the one you are referring to? In addition, I find there is no way to step back once I use local recoding onto the same sample but with different conditions (such as k=2,3,4...). Every time when we need to re-run local recoding, we need to shut down the workspace, right?

Also, I am not sure the meaning of 'fix point', 'attribute weight' and 'pass'... Could you please give some more hints?

Cheers, Yang

ately this feature is not very well documented at the moment and the

implementation is a little over-complicated. We will replace the interface for local recoding with a more intuitive interface in the future.

For now, I recommend the following process/parameterization:

  • In the configuration perspective, set the "suppression limit" to 100% in the "general settings" tab.
  • In the configuration perspective, go to the tab "coding model". Move the slider to the leftmost position.
  • After initial anonymization, run "local recoding" with the default parameters.
  • If you want to control the amount of generalization, use attribute weights, maximal generalization levels and potentially reduce the suppression limit.

I hop this helps! Fabian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arx-deidentifier/arx/issues/83#issuecomment-261791070, or mute the thread https://github.com/notifications/unsubscribe-auth/AOds6JjSlGay2EOujzCugbkmseLM9RO9ks5rAIAygaJpZM4K0veS .

prasser commented 7 years ago

Dear Yang,

(1) the tab ''coding model'' is only available for some data quality models (e.g. Loss). This means that these are also the only quality models for which local recoding is supported properly. In the upcoming version of ARX this option will be available for almost all quality models. If you want to use these features right now, you may want to use the current master from GitHub instead of our current release. The tab is located in the bottom-right of the configuration perspective. And: yes, I am referring to ARX 3.4.2.

(2) You do not need to shut down the workspace to re-run local recoding. If you want to try a different parameterization/configuration, please go to the configuration perspective, change the setting and run "anonymize". You can then perform local recoding. If you want to reset the local recoding for the selected transformation, switch to the exploration perspective, right click on the current transformation and select "apply".

(3) The parameters 'fix point', 'pass' are too hard to explain. Please believe me that the default settings are what you want. With attribute weights you can specify the importance of attributes. The tab is located in the bottom-right of the configuration perspective, but also only for some data quality models. 0.5 is the default value, <0.5 means less important >0.5 means more important.

Best regards Fabian

luyang1210 commented 7 years ago

Dear Fabian,

Thank you for answering in detail. Now I find all these buttons and try to re-state them.

1) the 'coding model' is under the Loss measure (I used to check the Discernability and that is the reason why I could not find it). I guess the leftmost can support the local recoding since it means "the greatest limitation to using suppression", right? I noticed the default generalization model becomes [0,0,0] once i do this. BTW, may I know the 'loss' metric is based on which metrics? There are a lot of metrics called loss...

2) Yes indeed. I guess 'local recoding' is assumed to be used only after the applying the anonymity operation? This is slightly different from the global recoding which can be run directly

3) ok I will use the default setting

Regards, Yang

Yang Lu

PHD student in Department of Computing & Information Systems

Melbourne School of Engineering

The University of Melbourne

Parkville, Victoria 3010.

On Wed, Nov 23, 2016 at 7:46 PM, Fabian Prasser notifications@github.com wrote:

Dear Yang,

(1) the tab ''coding model'' is only available for some data quality models (e.g. Loss). This means that these are also the only quality models for which local recoding is supported properly. In the upcoming version of ARX this option will be available for almost all quality models. If you want to use these features right now, you may want to use the current master from GitHub instead of our current release. The tab is located in the bottom-right of the configuration perspective. And: yes, I am referring to ARX 3.4.2.

(2) You do not need to shut down the workspace to re-run local recoding. If you want to try a different parameterization/configuration, please go to the configuration perspective, change the setting and run "anonymize". You can then perform local recoding. If you want to reset the local recoding for the selected transformation, switch to the exploration perspective, right click on the current transformation and select "apply".

(3) The parameters 'fix point', 'pass' are too hard to explain. Please believe me that the default settings are what you want. With attribute weights you can specify the importance of attributes. The tab is located in the bottom-right of the configuration perspective, but also only for some data quality models. 0.5 is the default value, <0.5 means less important

0.5 means more important.

Best regards Fabian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arx-deidentifier/arx/issues/83#issuecomment-262459225, or mute the thread https://github.com/notifications/unsubscribe-auth/AOds6FIaVDr50_SI0l-AvVHkDjMPA1Hbks5rA_1cgaJpZM4K0veS .

prasser commented 7 years ago

Dear Yang,

(1) For an overview of (most) quality models implemented by ARX, please see: http://arx.deidentifier.org/overview/metrics-for-information-loss/

(2) Moving the slider to the leftmost position in the "coding model" tab means "the greatest limitation to using generalization" or stated in other words: "use record suppression only".

Best regards Fabian

luyang1210 commented 7 years ago

Hi Fabian,

I have one question about the utility measures ARX provides. For instance I set "loss" and "consider the mean square error" in the 'configure transform' at the beginning. However after launching generalisation and local recoding, I cannot see such utility measure result.

Could you please advise me here? Many thanks.

Regards, Yang

Yang Lu

PHD candidate in Department of Computing & Information Systems

Melbourne School of Engineering

The University of Melbourne

Parkville, Victoria 3010.

On Wed, Nov 23, 2016 at 6:58 PM, Yang Lu luy4@student.unimelb.edu.au wrote:

Dear Fabian,

Thank you for the kind reply. Regarding the second point, I didnt find the tab called ''coding model''. By the way I am using the 3.4.2. Is it the one you are referring to? In addition, I find there is no way to step back once I use local recoding onto the same sample but with different conditions (such as k=2,3,4...). Every time when we need to re-run local recoding, we need to shut down the workspace, right?

Also, I am not sure the meaning of 'fix point', 'attribute weight' and 'pass'... Could you please give some more hints?

Cheers, Yang

ately this feature is not very well documented at the moment and the

implementation is a little over-complicated. We will replace the interface for local recoding with a more intuitive interface in the future.

For now, I recommend the following process/parameterization:

  • In the configuration perspective, set the "suppression limit" to 100% in the "general settings" tab.
  • In the configuration perspective, go to the tab "coding model". Move the slider to the leftmost position.
  • After initial anonymization, run "local recoding" with the default parameters.
  • If you want to control the amount of generalization, use attribute weights, maximal generalization levels and potentially reduce the suppression limit.

I hop this helps! Fabian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arx-deidentifier/arx/issues/83#issuecomment-261791070, or mute the thread https://github.com/notifications/unsubscribe-auth/AOds6JjSlGay2EOujzCugbkmseLM9RO9ks5rAIAygaJpZM4K0veS .

prasser commented 7 years ago

Hi Yang,

the results in terms of utility measures are currently not presented to the user in a easy-to-understand manner. We are working on an according view, which will be added to the next version of ARX.

Information loss is to some degree correlated with the score reported by ARX for a selected transformation. (see, e.g., "Analyze/enhance utility"->"Properties"). However, this only works well for some models and not at all for local recoding.

Sorry & best regards Fabian

P.S.: Please direct further questions on ARX towards arx.deidentifier@gmail-com

luyang1210 commented 7 years ago

Hi Fabian,

Thank you so much for the reply.

I have another question towards local recoding. Currently I am testing datasets with different settings (like k=2, 3, 4). However, with the same records, same QIs, k values and attribute weights, the results of local recoding remain, right? Here is my procedures of local recoding:

  1. setting the "suppression limit" to 100% in the "general settings" table
  2. In the configuration perspective, go to the tab "coding model". Move the slider to the leftmost position..
  3. After initial anonymization, run "local recoding" with the default parameters.

Could you please tell me the name of underpinning algorithm? Thanks.

Cheers Yang

On Tue, Aug 8, 2017 at 8:14 PM, Fabian Prasser notifications@github.com wrote:

Hi Yang,

the results in terms of utility measures are currently not presented to the user in a easy-to-understand manner. We are working on an according view, which will be added to the next version of ARX.

Information loss is to some degree correlated with the score reported by ARX for a selected transformation. (see, e.g., "Analyze/enhance utility"->"Properties"). However, this only works well for some models and not at all for local recoding.

Sorry & best regards Fabian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arx-deidentifier/arx/issues/83#issuecomment-320913733, or mute the thread https://github.com/notifications/unsubscribe-auth/AOds6E-ayVHyJ7YZ2YL1p7sOYDBnYrcyks5sWDT-gaJpZM4K0veS .

prasser commented 7 years ago

Dear Yang,

please ask questions by email: arx.deidentifier@gmail.com

Best regards Fabian

prasser commented 6 years ago

Explicit support for the methods discussed here has been added to ARX 3.7.0.