NHMDenmark / DanSpecify

Important files regarding the Danish instance of the Specify database system for collections digitisation and management, plus placeholder for issue tracking. Guidelines, manuals and other kinds of documentations will be gathered on the wiki.
3 stars 2 forks source link

Augment NHMA taxon tree with sorting numbers #249

Closed FedorSteeman closed 11 months ago

FedorSteeman commented 11 months ago

NHMA uses a series of numbers ("sortnr") to identify taxa with as noted on their collection drawers. Detailed as issue https://github.com/NHMDenmark/Mass-Digitizer/issues/373 the plan is to have their app incorporate and serve these numbers through its UI.

In order for this to happen, the NHMA sorting or taxon numbers should be added to a field in Specify, which then can be synchronized to the app.

FedorSteeman commented 11 months ago

@jlegind forwarded me the following spreadsheet, associating the taxon numbers ("sortnr") with the corresponding taxa.

Aarhus_Dk_lepidoptera2013.xlsx

As per e-mail communication with @hansviborg & colleague, the superfamilies can be left out, which is good, as that would entail more manual restructuring of the taxon tree. The taxon number and its source can then be mapped to customized fields in Specify (text1 and text2 respectively).

FedorSteeman commented 11 months ago

From the spreadsheet I surmised that not all taxa, especially species, have been added to the taxon tree yet. So these will then have to be inserted, with taxon keys/numbers/source and all.

However, as-is the file is very hard to work with, because of the way it is structured.

It would be a lot easier if the genus name and species were each in separate columns. Not even the “Rubin” column is helpful here, since there are many instances where the species has been assigned a different genus. It’s quite a lot of manual work to copy and paste the genus name over into a separate column.

I will attempt to find an automated way to transfer this information into a format that I can have parsed, but this is gonna take some time.

jlegind commented 11 months ago

The original NHMA Lepidoptera taxonomic spreadsheet was translated into a format more conducive to further processing, especially for adding the alternative taxon identifiers to the Specify taxon table. The 'translation' of that taxonomy can be found in the N drive: N:\SCI-SNM-DigitalCollections\DaSSCo\Digi App\NHMA Entomology

The Py code for this data wrangling is here: https://github.com/NHMDenmark/Mass-Digitizer/blob/main/MassDigitizer/Aarhus_taxonomy.py

FedorSteeman commented 11 months ago

The taxon keys have been inserted into the NHMA database.

What remains is adding the fields to the taxon editing forms to give NHMA control over these fields.

Sosannah commented 11 months ago

Final step is also done:

Taxon Key and Taxon Key Source fields have been added to the taxon editing forms to give NHMA control over these fields.