armenak / DataDefender

Sensitive Data Management: Data Discovery and Anonymization toolkit
Apache License 2.0
146 stars 55 forks source link

Anoymize flat files #100

Open marciahon29 opened 6 years ago

marciahon29 commented 6 years ago

Hello,

I work with text that have diagnosis such as Alzheimer's, Parkinson's, etc.

Could I create a file with these words? and force these to be anonymized?

How would I do this?

Thanks

armenak commented 6 years ago

I do not understand your question. This tool anonymizes only databases, not flat files.

marciahon29 commented 6 years ago

There are certain words that I would like to be annonymized. For example, "Parkinson", "Alzheimer", "Schizophrenia", "Depression", etc.

How can I configure the application to recognize and anonymize these words.

armenak commented 6 years ago

If you are talking about flat file, than this tool does not anonymize flat files. However, if the data is in the database, then you can run database discovery against the database in question, get the result (list of "suspected" columns"), and then create a requirement document (please see samples in "sample" directory.

Currently, NLP models do not work with medical data, but by end of this week I am going to provide this functionality.

marciahon29 commented 6 years ago

Thank you, I look forward to your enhancements.

Please could you also let me know how I can do something like this? I am anticipating that there are going to be many text that I would like to be anonymized.