NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Adapt GREL script (+ guide) to take heed of novel taxa's inferred rank #327

Closed FedorSteeman closed 1 year ago

FedorSteeman commented 1 year ago

Issue

Context: We need to infer taxon rank ID from the name structure because it isn't available from any other field.

Solution

The post-processing script should take heed of how the app currently infers the rank of novel taxa based on the following logic:

elementCount = len(taxonNameEntry.strip().split(' '))
subgenusCount = 0
if '(' in taxonNameEntry: subgenusCount = 1

if ' var. ' in taxonNameEntry: rankid = 240
elif ' subvar.  ' in taxonNameEntry: rankid = 250
elif ' f. ' in taxonNameEntry: rankid = 260 
elif ' subf. ' in taxonNameEntry: rankid = 270 
elif elementCount == 3 + subgenusCount: rankid = 230
elif elementCount == 2 + subgenusCount: rankid = 220
elif elementCount == 1 + subgenusCount: rankid = 180

The guide should also be updated to make sure that digitizers adhere to this format:

Tasks:

Acceptable solution

Solution has been implemented in https://github.com/NHMDenmark/Mass-Digitizer/blob/main/OpenRefine/post_processing.json

jlegind commented 11 months ago

Other GREL issue: #411