There is a risk that novel names could show up duplicated in the Specify taxonomy after they are imported through Specify workbench.
Also we need to check previous names imported into the botany spine.
Estimate level of effort required.
Hard
What is the expected acceptable result.
That no duplicates are added to the Taxonomy.
Existing names are differentiated by author where appropriate.
Give a clear approach/potential solution on how to resolve it.
During the post processing step, new names are identified. They will show up in Open-Refine under the 'remarks' column.
The pattern is this: | Verbatim_taxon: [the taxon name itself].
The post processed file is imported.
After this, the novel names can be explored through the taxon tree and duplicates can be merged according to this guide :
N:\SCI-SNM-DigitalCollections\DaSSCo\Specify taxonomy cleaning\Cleaning taxonomy using the taxon tree in Specify 7.docx
Updating authored names post hoc
A solution could be to have a spreadsheet with drop downs for each name having been imported that has more than one authored names. The drop list contains the authors applicable to that name.
The process would be:
To query for the records that were already imported into Vascular plants in Specify.
Match those names to the new taxonomy where author names exist.
Those names existing more than twice with different author names are separated into a file
Create a spreadsheet with a drop list for each name containing the alternatives for that name ("Aa filamentosa M.L.Ortiz, 1937", or "Aa filamentosa Mansf.")
The solution will be developed in Python using the xlsxwriter package
What could be the challenges ?
If novel names are very close to an existing name like:
Abacopteris menisciicarpa (Blume) Holttum
Abacopteris menisciicarpos (Blume) Holttum
or
Aa mathewsii (Rchb.f.) Schltr.Aa matthewsii (Rchb.f.) Schltr.
What should the process be in these cases?
What test are required ?
A Levenshtein distance test could be created to see if the name is very close to another.
Unfortunately SQLite does not support this feature out of the box, but there are extensions that could be employed.
What is the issue ?
There is a risk that novel names could show up duplicated in the Specify taxonomy after they are imported through Specify workbench. Also we need to check previous names imported into the botany spine.
Estimate level of effort required.
Hard
What is the expected acceptable result.
That no duplicates are added to the Taxonomy. Existing names are differentiated by author where appropriate.
Give a clear approach/potential solution on how to resolve it.
During the post processing step, new names are identified. They will show up in Open-Refine under the 'remarks' column. The pattern is this:
| Verbatim_taxon: [the taxon name itself]
.The post processed file is imported.
After this, the novel names can be explored through the taxon tree and duplicates can be merged according to this guide :
N:\SCI-SNM-DigitalCollections\DaSSCo\Specify taxonomy cleaning\Cleaning taxonomy using the taxon tree in Specify 7.docx
Updating authored names post hoc
A solution could be to have a spreadsheet with drop downs for each name having been imported that has more than one authored names. The drop list contains the authors applicable to that name. The process would be:
What could be the challenges ?
If novel names are very close to an existing name like:
Abacopteris menisciicarpa (Blume) Holttum
Abacopteris menisciicarpos (Blume) Holttum
or
Aa mathewsii (Rchb.f.) Schltr.
Aa matthewsii (Rchb.f.) Schltr.
What should the process be in these cases?
What test are required ?
A Levenshtein distance test could be created to see if the name is very close to another. Unfortunately SQLite does not support this feature out of the box, but there are extensions that could be employed.