NHMDenmark / Mass-Digitizer

Common repo for the DaSSCo team
Apache License 2.0
1 stars 0 forks source link

Column 'newtaxonflag' is not properly set in post processing #431

Closed jlegind closed 8 months ago

jlegind commented 9 months ago

What is the issue ?

The column 'newtaxonflag' is not set correctly after being post processed.

Detailed description of the issue.

When the submitted CSV file from the digitizers is processed into a Specify workbench digestible format, the newtaxonflag column fails to populate with the value 'True'. This column depends on the "taxonspid" column for setting the boolean values. Newtaxonflag itself is used to determine the newgenus-, newspecies-, newsubspecies, newvariety, and newforma-flags.

Why is it needed/relevant ?

The new[rank]flags are needed for the Specify workbench import. It helps Specify assigning these new names into the Specify taxonomic spine.

Effort required

easy

What is the expected acceptable result.

The newtaxonflag should be set according to the rule that if taxonspid is 0 or NULL or empty, then the new[name][rank] flag gets set with 'True'.

How to approach it?

Analyze the OpenRefine GREL script.

Solution

if((value==null).or(value==0).or(value==''), 'True', 'False')

What test are required ?

Prepare csv file with missing/null/0 value taxonspids and run it through the OpenRefine process.

FedorSteeman commented 9 months ago

Estimated to take an hour including testing. Two hours including cake.

FedorSteeman commented 8 months ago

Fixed and cleaned/re-ordered script.