elixir-europe / rdmkit

ELIXIR Data Management Toolkit - Find the answers to your research data management questions here.
https://rdmkit.elixir-europe.org
Other
84 stars 134 forks source link

Data Anonymization and Pseudonymization #339

Closed robertmand closed 3 years ago

robertmand commented 3 years ago

What topic do you wish to add? This page gives definitions of these terms and suggestions on how to achieve anonymization and pseudonymization of data.

Are there existing pages in the RDM toolkit website related to the requested page? Pages around human sensitive data and GDPR.

Resources If there are there resources that could be utilised for writing the new page, please list them below:

Context If this request is coming from a particular project, domain, or use-case please list them below: A couple of us wrote this at a previous contentathon in googledocs, and forgot to tell people it was there. SO ... I'm putting it in now

Here is the text:

Description Data anonymization is the process of irreversibly modifying personal data in such a way that subjects cannot be identified directly or indirectly by anyone, including the study team. If data are anonymized, no one can link data back to the subject.

Pseudonymization is a process where identifying-fields within data records are replaced by artificial identifiers called pseudonyms or pseudonymized IDs. Pseudonymization ensures no one can link data back to the subject, apart from nominated members of the study team who will be able to link pseudonyms to identifying records, such as name and address.

Data anonymization involves modifying a dataset so that it is impossible to identify a subject from their data. Pseudonymization involves replacing identifying data with artificial IDs, for example, replacing a healthcare record ID with an internal participant ID only known to a named clinician working in the study.

Considerations

Solutions

Relevant tools and resources

Thanasis Vergoulis vergoulis@athenarc.gr Robert Andrews andrewsr9@cardiff.ac.uk

pinarpink commented 3 years ago

IMO this content can initially go to Data Classification page. Perhaps we might emend the page title 'Data Classification and De-identification'. What say you @bedroesb @floradanna ?

floradanna commented 3 years ago

Yes, it could make sense. Data Classification so far has only 1 sub-problem (how to figure out if your data are sensitive or not). Maybe a second sub-problem could be " how to achieve anonymization and pseudonymization of sensitive data".

bedroesb commented 3 years ago

do we need a new / different tag ?

floradanna commented 3 years ago

if the page is the same, I would not use an additional tag. It could complicate things. We better make use of keywords in this case.

jmenglund commented 3 years ago

I agree with @pinarpink that the Data Classification page is currently the best place for the text. When adding the problem to that page, it is probably a good idea to also take a look at the other problem on that page, "Is my data sensitive?". Some of the bullets under considerations touch upon the same topic.