NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Modify the App UI to have NHMA field appear to take taxonID #373

Closed jlegind closed 10 months ago

jlegind commented 1 year ago

Issue

The issue is related to #348 where it was decided that a taxonomic lookup based on a published taxonomy1 where alternative taxonomic IDs rather than taxon names should form the look-up key.
The issue here is that there are collections that rely on numbers or other abstract values instead of taxonomic names on their boxes , drawers and other specimen containers.
The taxonomy in the published taxonomy for this alternative taxonomic system was grouped by super-family so that each group consisted of super family, family, genus, species. There can be multiple genera within a family so there can be several sub-trees as demonstrated below:

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

rank | name -- | -- supfam | Eriocranioidea famil | Eriocraniidae genus | Dyseriocrania species | subpurpurella genus | Paracrania species | chrysolepidella genus | Eriocrania species | unimaculella species | sparrmannella species | salopiella species | cicatricella species | semipurpurella species | sangii

All this must be transformed into a structure with each name having its own row and the full taxonomy up until and including superfamily.

Furthermore this requires modifications to the Digi App UI as well as adding a new column to table 'taxonname' that stores the alternative taxon identifier.

The published taxonomy implementation was decided in June 2023.

1 "Karsholt, O., & Stadel Nielsen, P. (2013). Revideret fortegnelse over Danmarks sommerfugle. Lepidopterologisk Forening, København." - N:\SCI-SNM-DigitalCollections\DaSSCo\Digi App\NHMA Entomology\KArsholt & Nielsen 2013.pdf

Estimate level of effort required.

Medium/hard

What is the expected acceptable result.

A version of the Digi App where a digitizer logged in under the collection in question will be presented with an additional input field 'Taxonomic ID' that takes the identifier and looks up the name behind that identifier which is then displayed in the 'taxonomic name' field. It was decided that the functionality of the Taxonomic name field should remain.
An email conversation with the representatives of the collection in question solved the issue of super-families: They were not requested to be part of the alternative taxonomy (Thomas Simonsen, NHMA, email 18-08-2023).
In the conversation with NHMA, they also accepted that we employ utility fields in the Specify taxon table ("Text1") to store the alternative taxonomic identifiers.

The taxon.text2 field will be employed for the source "Checklist of Danish Lepidoptera".

How to approach the issue?

The NHMA taxonomy must be transformed into a format having an atomic name for each taxonomic concept. For instance, family 'Eriocraniidae', genus ''Dyseriocrania, and species 'subpurpurella' must each have their own name and identifier. This record should also contain the Specify ID as well as, and the taxonomic source name "Checklist of Danish Lepidoptera". The Specify ID needs to be looked up via a script. This will link the inputted id to DaSSCo taxonname table.
Updating the taxonomy SQL insert files (for NHMA at least) would be needed.

What could be the challenges ?

- [x] The NHMAjoin table needs to be easily created. (Now implemented in the Aarhus_taxonomy.py script) This has been overtaken by incorporating the taxonomic identifiers into the Specify taxon table which Fedor already did.
There is also the issue that if collection staff update or add to the alternative taxonomy, it would warrant a new taxonomic export. But how would we know that this update has happened?

There is the real possibility that a digitizer at the collection will input a number that does not correspond to any name in the App look-up table. This should be solved in the App training. (Pip Brewer, 31-08-2023)

What test are required ?

Sample name lookups where known records from the original are compared to a SQL query on the look-up table. Direct comparison between original taxonomy and the derived published taxonomy table : N:\SCI-SNM-DigitalCollections\DaSSCo\Digi App\NHMA Entomology\Aarhus_Dk_lepidoptera2013_validation.xlsx This test uncovered that the Specify taxonomy for that specific collection contained four names that were not present in the Digi App taxon table. These have been added to the Specify taxonomy which we derive the App taxonomy from.

What documentation required?

Updating the training manual for digitizers. Chelsea was contacted about updating the training material (email 07-09-2023)

FedorSteeman commented 1 year ago

The taxon keys have been inserted into the NHMA Specify database.

So a fresh taxon insert file can now be extracted from that source, including those.

@jlegind Do you need help adjusting the SQL file to this end?

jlegind commented 1 year ago

When the new taxonomic spine for the Digi App is ready I will modify the Python script to employ the alternative taxon id field for NHMA Entomology users.

jlegind commented 10 months ago

Fedor completed this task with commit: https://github.com/NHMDenmark/Mass-Digitizer/commit/0d0beba958abcf787a769503c7168834380b5c3f