AAFC-BICoE / dina-planning

AAFC-DINA planning repository
3 stars 2 forks source link

Determination label - verbatim data #209

Open OwenLonsdale opened 3 years ago

OwenLonsdale commented 3 years ago

GIVEN I have accessed DINA as [a role]

WHEN I transcribe determination label data

THEN I need to record a faithful string of full verbatim data

An aggregate of parsed data is not always adequate for some data, especially when these relate to reproduction in published articles, specifically relating to type specimen catalogues. In these cases, full, ordered verbatim data is needed for all labels on a specimen, sometimes including colour and shape of labels. See below as examples:

https://doi.org/10.11646/zootaxa.4862.1.1 [*see Appendix A]

https://www.semanticscholar.org/paper/Name-bearing-type-specimens-of-Trichoptera-the-of-%26-Lonsdale/50700a2458b675b4bff6bb5bdc5ab9f429ed3dc9

http://www.nadsdiptera.org/Catalogs/CNCtypes/Suppl.htm

dshorthouse commented 3 years ago

@OwenLonsdale There's considerable discussion about this in the wider TDWG community. See https://github.com/tdwg/dwc/issues/32. The conflict in that debate has much to do with the expected content in such a field, largely because "label" varies considerably from one curatorial practise to another. An entomological label is vastly different from a botanical label – the former is unlikely to have a template, consisting entirely of hand-written text (though verbatim must actually be interpreted by the transcriber or the OCR engine), whereas the latter might have combinations of typed prefixes and hand-written content in a tabular layout (i.e. consists of key:value pairs whose blank values might bear meaning). In fact, many botanists now use QR codes on their det. labels that point to their ORCID accounts. All this to say that "verbatim" is a highly variable concept. Indeed expressions of colour and shape are not considered verbatim content in all circles. Clearly, an image is the gold standard for "verbatim" though impractical for all.

I'm assuming your idea here is to have a verbatim det. label field for every single one of the 1:many stacked det. labels that might be on one specimen, correct? Might there also be individual images for these that should ideally have a tight association with the verbatimLabel content? At present, there are no provisions to have attachments for individual determinations, though we internally debated the mechanics and practically of that. Instead, attachments are presently limited to the level fo the material sample (= specimen) as a whole & it would be up to the user to deduce which of the image attachments corresponds to which of the potentially many verbatim det. fields if we added them to the system.

OwenLonsdale commented 3 years ago

The comments in the TDWG forum indicate that there is a lack of clear consensus. Without commenting on the requirements of the wider community, I can say that this function is needed for us, the people for whom the database is being created. It is impractical to take images of every det. label that gets added to a specimens, and even if it were more practical, people wouldn't do it. In our case, the accurate recording of all text on each label is useful, and often critical for differentiating specimens, some of which are mislabeled types, unlabeld types, specimens form which key misidentifications have been made, etc. How much would it affect your notions of what is proper if the verbatim field for the collection label served as a dumping ground for the data on ALL labels?

dshorthouse commented 3 years ago

@OwenLonsdale – First, we have to determine if this is a requirement for all or for some. In other words, do all collections want this? If so, then it's an additional field(s) available to all: for botany, for entomology, for living collections, etc. If there's no consensus, there are of course custom fields (= managed attributes) where these can be created to store any additional content not shared by all. These can be created for any component like Collecting Events or Material Samples for use is any one or all collections.

That said, I suspect there are additional requirements here that may later be expressed that we've not first captured here the way we expected them to be. These managed attributes are generic solutions that do not have custom features. Ordering of verbatim det. transcriptions per specimen, multiple transcriptions of verbatim collecting event labels (i.e. some data shared by many specimens but uniquely expressed per specimen). In principle, you could use these managed attributes for ALL labels, but because these are dispersed among and between components, we have to ask questions about how you expect them to be viewed, used, exported in what context. Am envisioning requirements like, "Show me all the verbatim transcriptions for this one specimen and additionally show them in the order on which they are present on the pin from bottom to top (or top to bottom)". That sort of granularity may be difficult to accommodate when/if the collecting event label transcriptions are stored in a Collecting Event (& there are umpteen dozen specimens that share the same linked Collecting Event each with their own, slightly varying collecting event label), the det. label transcriptions are stored as an unordered group at the level of the Material Sample.