Closed robertmand closed 3 years ago
IMO this content can initially go to Data Classification page. Perhaps we might emend the page title 'Data Classification and De-identification'. What say you @bedroesb @floradanna ?
Yes, it could make sense. Data Classification so far has only 1 sub-problem (how to figure out if your data are sensitive or not). Maybe a second sub-problem could be " how to achieve anonymization and pseudonymization of sensitive data".
do we need a new / different tag ?
if the page is the same, I would not use an additional tag. It could complicate things. We better make use of keywords in this case.
I agree with @pinarpink that the Data Classification page is currently the best place for the text. When adding the problem to that page, it is probably a good idea to also take a look at the other problem on that page, "Is my data sensitive?". Some of the bullets under considerations touch upon the same topic.
What topic do you wish to add? This page gives definitions of these terms and suggestions on how to achieve anonymization and pseudonymization of data.
Are there existing pages in the RDM toolkit website related to the requested page? Pages around human sensitive data and GDPR.
Resources If there are there resources that could be utilised for writing the new page, please list them below:
Context If this request is coming from a particular project, domain, or use-case please list them below: A couple of us wrote this at a previous contentathon in googledocs, and forgot to tell people it was there. SO ... I'm putting it in now
Here is the text:
Description Data anonymization is the process of irreversibly modifying personal data in such a way that subjects cannot be identified directly or indirectly by anyone, including the study team. If data are anonymized, no one can link data back to the subject.
Pseudonymization is a process where identifying-fields within data records are replaced by artificial identifiers called pseudonyms or pseudonymized IDs. Pseudonymization ensures no one can link data back to the subject, apart from nominated members of the study team who will be able to link pseudonyms to identifying records, such as name and address.
Data anonymization involves modifying a dataset so that it is impossible to identify a subject from their data. Pseudonymization involves replacing identifying data with artificial IDs, for example, replacing a healthcare record ID with an internal participant ID only known to a named clinician working in the study.
Considerations
Solutions
Relevant tools and resources
Thanasis Vergoulis vergoulis@athenarc.gr Robert Andrews andrewsr9@cardiff.ac.uk