IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 491 forks source link

Identifying personally identifiable information in datasets upon upload/publish #6775

Open baobaofzhang opened 4 years ago

baobaofzhang commented 4 years ago

Some datasets uploaded to the Harvard Dataverse contain personally identifiable information (PII), particularly IP address and geolocation data (longitude and latitude). PII is included (as default) in survey data when researchers use internet survey software, such as Qualtrics. Researchers should delete this PII when they upload datasets to Harvard Dataverse. I did a quick search for "LocationLatitude" (the standard variable name for latitude data in Qualtrics) and it turned up 130 such datasets. Many of these datasets do not require geolocation data as part of their analysis. I suspect that researchers had uploaded the PII because they were not careful. @adam3smith suggested that Dataverse have an automated check and alert users when they are uploading datasets with PII.

djbrooke commented 4 years ago

Thanks @baobaofzhang for creating this and for tagging @adam3smith. I may tweak the title a bit to reflect that we'd want some automated utility in the Dataverse platform itself (55 installations worldwide) instead of just in Harvard Dataverse.

Regarding the removal of the PII itself, the Harvard Dataverse Data Curation Team is contacting authors now.

cmbz commented 2 months ago

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

adam3smith commented 2 months ago

It's not my issue originally, but since I was tagged & involved: I think this is quite important (especially for self-publish repositories like Harvard DV) and should be considered for a feature (it also has a label that starts with "Feature:"). I don't think this duplicates any existing issues.

cmbz commented 2 months ago

Reopening as per @adam3smith's comment: https://github.com/IQSS/dataverse/issues/6775#issuecomment-2299143304

cmbz commented 2 months ago

@adam3smith I closed issues labeled Type: Feature, this one just had the suggestion label. I'll update with the new label. Thanks.