Closed timrobertson100 closed 4 years ago
Thanks @timrobertson100. I think it is important here to at least mention GUIDs as a future. I am suggesting replacing the second paragraph with something like
Identifying duplicates across institutions is not easy as, especially for historic and legacy collections, it is often difficult to determine duplicate specimens. Some institutions, such as CRIA in Brazil in its speciesLink project and the Atlas of Living Australia use matching across a number of fields such as collector number, date and locality while GBIF is developing an algorithm for data-clustering. Currently, however, there is no universal global system available. The use of unique, persistent and resolvable Globally Unique Identifiers (GUIDs) (Page 2009, Richards 2010, Richards et al. 2011) will aid these processes in the longer-term, but unfortunately, the implementation of specimen-level GUIDs still seems some way off. A recent paper by Nelson et al. (2018) makes a number of recommendations on minting, managing and sharing [GUIDs] for herbarium specimens but until there is universal adoption of such techniques, identifying duplicates across institutions remains an issue.
Thanks Arthur, that looks much clearer.
I propose this be reworked to just be an information block describing that duplicates can be a source of "leaking" information even though one herbarium does diligently obfuscate location.
I'd recommend removing the paragraph starting "Perhaps the..." as there is no actionable guidance here and chooses to recommend one approach of FilteredPush which is a project that didn't become mainstream. Really the guidance is to promote better cataloging in the collection management systems of the location of duplicate specimens so other records can also be modified. GUIDs could play a part here, as can data clustering but I am not sure that is important to mention in this guide. FilteredPush is not an infrastructure in mainstream use across GBIF so I don't think should be referenced.