gbif / doc-sensitive-species-best-practices

This document aims to describe current best practices for dealing with primary occurrence data for sensitive species and provide guidance on how to make as freely data available as possible and as protected as necessary.
https://doi.org/10.15468/doc-5jp4-5g10
Other
1 stars 1 forks source link

Rework section 4.4: Duplicates #11

Closed timrobertson100 closed 4 years ago

timrobertson100 commented 4 years ago

I propose this be reworked to just be an information block describing that duplicates can be a source of "leaking" information even though one herbarium does diligently obfuscate location.

I'd recommend removing the paragraph starting "Perhaps the..." as there is no actionable guidance here and chooses to recommend one approach of FilteredPush which is a project that didn't become mainstream. Really the guidance is to promote better cataloging in the collection management systems of the location of duplicate specimens so other records can also be modified. GUIDs could play a part here, as can data clustering but I am not sure that is important to mention in this guide. FilteredPush is not an infrastructure in mainstream use across GBIF so I don't think should be referenced.

ArthurChapman commented 4 years ago

Thanks @timrobertson100. I think it is important here to at least mention GUIDs as a future. I am suggesting replacing the second paragraph with something like

Identifying duplicates across institutions is not easy as, especially for historic and legacy collections, it is often difficult to determine duplicate specimens. Some institutions, such as CRIA in Brazil in its speciesLink project and the Atlas of Living Australia use matching across a number of fields such as collector number, date and locality while GBIF is developing an algorithm for data-clustering. Currently, however, there is no universal global system available. The use of unique, persistent and resolvable Globally Unique Identifiers (GUIDs) (Page 2009, Richards 2010, Richards et al. 2011) will aid these processes in the longer-term, but unfortunately, the implementation of specimen-level GUIDs still seems some way off. A recent paper by Nelson et al. (2018) makes a number of recommendations on minting, managing and sharing [GUIDs] for herbarium specimens but until there is universal adoption of such techniques, identifying duplicates across institutions remains an issue.

timrobertson100 commented 4 years ago

Thanks Arthur, that looks much clearer.