NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Issue with NHMA taxon numbers base document #416

Closed PipBrewer closed 1 year ago

PipBrewer commented 1 year ago

What is the issue ?

NHMA have taxon numbers (rather than the taxon name) on the front of their drawers 
which correspond to the numbers in Karsholt& Nielsen 2013. To make this efficient 
(rather than digitisers looking up the reference and finding the correct taxon name
 during digitisation), it was decided to offer the feature in the DigiApp that they could
 type in the taxon number and it would automatically populate the taxon name. To
 facilitate this, both NHMA and Ole Karsholt provided the numbers and taxa in an
 Excel document. This was incorporated into the DigiApp. Ot has now been realised
 that the numbers on the Excel spreadhseet and the publication do not correspond.
 NHMA were unaware of this (email on 21/09/2023). PB has contacted Ole Karsholt
 (email sent 21/09/2023) to find out what the issue is and whether there is a correct 
Excel spreadsheet that can be used.

Why is it needed/relevant ?

At the moment, the numbers give the wrong taxon and so it is unusable by NHMA
until this is fixed.

Estimate level of effort required.


 difficult

What could be the challenges ?

Translating the published taxonomy (pdf) to a usable table.

What test are required ?

Looking up names manually to see if they are found and if the ID number corresponds to the name submitted.

What documentation required?

A description of the steps taken to extract the correct taxonomy.

FedorSteeman commented 1 year ago

By converting the pdf to raw text and unleashing some OpenRefine magic, I managed to distill it to a spreadsheet that I can use to import into the app.

Lep2013-CheckList-csv v2.xls