Leibniz-HBI / Social-Media-Observatory

This repository is the central communication and project management interface for the Social Media Observatory hosted by the Leibniz Insitute for Media Research | Hans-Bredow-Institute
https://leibniz-hbi.github.io/SMO/
Creative Commons Attribution 4.0 International
26 stars 1 forks source link

Explore tools for data anonymisation/pseudonymisation/differential privacy etc. #68

Closed FlxVctr closed 2 years ago

Khandoker09 commented 3 years ago

Data Anonymization Tools

Data privacy is very important for those who especially work in industries like social media, healthcare, retail or financial sectors However, personal data in the wrong hand can jeopardize the security of individuals. To protect personal data from the wrong hand the concept of data anonymization was introduced. data anonymization is a process of altering identifiable data such as like name, gender, age, or other personal information by replacing it with sets of data that can nearly impossible to trace back to its origin. A good data anonymizer should have certain requirements to follow. In our case firstly we look for an open-source anonymizer. Also, we prioritize the tools which required fewer programming skills and have a GUI or interactive dashboard. Also, It’s a good practice to use the GDPR compliant tools. GDPR is general data protection regulations that ensure the protection of personal data and privacies of EU citizens.

ARX

ARX is open-source software for data anonymization. it supports data transformations in a way that ensures user-specific privacy and controls statistical disclosure. this software helps to mitigate attacks regarding privacy breaches. using ARX we can easily remove direct identifiers such as name or phone number or other personal information which can be used for the cyber attack. An indirect identifier is used to replace the direct identifier from the data sets. ARX also supports various methods of protecting sensitive data.

Basic Features :

Amnesia

Amnesia is also an open-source data anonymization tool. it supports K-anonymity and km-anonymity. the online version of the editor can be used without installing anything on the computer. The basic idea is to load the original datasets first then anonymize the datasets which can be store locally. Amnesia works in five steps datasets, hierarchy, algorithms, solution graphs, and lastly anonymized datasets.

Basic Features :

Anonimatron

Another open-source data pseudonymized which can be used to anonymize data in order to perform a test or find a bug in the data outside of the client environment. It can anonymize emails, names, ID. Anonimatron can also anonymize databases.

Basic Features :

FlxVctr commented 3 years ago

The tools look good. Maybe scrap the introduction, it's a bit off-topic and would need work on understandability. Have you tested the tools with some test data? You could use the parliamentarian data from the Dboes-Automatization repo, for example.

Instead of the detailed features, I'd rather try to describe the typical use-cases. While ARX seems to be written for actual studies, Anonimatron seems rather geared towards generating test-data for tool-development, for example.

FlxVctr commented 3 years ago

Also, instead of the introduction, try to make transparent what your selection criteria were. Why are those tools included and not others?

FlxVctr commented 3 years ago

Also: The online version of Amnesia should not be recommended by us. Using JavaScript tools is ok locally. But remotely it is considered unsafe.

FlxVctr commented 3 years ago

It'd be also interesting whether the tools have a CLI or programming module version, which can be used to anonymise data automatically on arrival.

Khandoker09 commented 3 years ago

The tools look good. Maybe scrap the introduction, it's a bit off-topic and would need work on understandability. Have you tested the tools with some test data? You could use the parliamentarian data from the Dboes-Automatization repo, for example.

Instead of the detailed features, I'd rather try to describe the typical use-cases. While ARX seems to be written for actual studies, Anonimatron seems rather geared towards generating test-data for tool-development, for example.

Fixed the intro according to the requirement. will test with dboes data this week

Khandoker09 commented 3 years ago

Also: The online version of Amnesia should not be recommended by us. Using JavaScript tools is ok locally. But remotely it is considered unsafe.

fixed

Khandoker09 commented 3 years ago

Also, instead of the introduction, try to make transparent what your selection criteria were. Why are those tools included and not others?

described in the intro

FlxVctr commented 3 years ago

Thanks for the changes. Can you please put that in a Google Doc (simply in Markdown Syntax) or in our OX, so that we can finalise that? Please make it editable by me and assign the issue to me when you think it's ready for feedback.

Khandoker09 commented 3 years ago

https://github.com/Khandoker09/markdown_syntax

FlxVctr commented 3 years ago

As discussed, I think, a collaborative document file like GDocs, OX, or OnlyOffice will work better for drafting new articles.