Closed bhsi-snm closed 5 months ago
Information provided by Zsuzsanna Papp is that in Botany, homonyms are common within a family. The original idea was to have a partial taxonomic hierarchy visible to the digitiser (similar to the storage field); however, if they are common within families, this would not resolve this. The solution is to create column(s) for Author and year (remember that commas and brackets etc are important here) in taxon table. This should be visible when selecting taxon in UI. I'm not sure how much of Botany spine has author and year in Specify. May need to get that using GBIF API.
This may take a day or two to implement.
Major issue here is that, when it comes to botany, author information was not added to Specify from the taxonomy source, so will have to be added after the fact.
It is important that this is solved, because it appears that the lack of authorship in the app taxon spine is creating duplicates in Specify upon import in cases where those same taxa do have authorship set. This means that Workbench will not match those taxa and create a new one without authorship. This is bad, because these duplicates will then feed back into the taxon spine of the app db.
One problem with adding authorship to the fullname is that this will screw up the algorithm for guessing taxon rank. Work in progress.
Fortunately, the taxon trees for the two Entomology collections did already possess authorship, and it was easy to transfer those to the app. For NHMD Vascular Plants, the authorship will still need to be added from a source data set.
Latest taxonomy can be fetched here: https://www.checklistbank.org/dataset/53147/download?taxonID=7707728
Taking a look at the source for the taxon spine for vascular plants I can see two issues:
When updating authorship, it will not be possible to distinguish homonyms, but perhaps these can be added later
Plan forward is using SQL to achieve following:
Just trying and testing in the test db; Updating taxa with author goes fine, but I found my first homonym that evades this effort:
So I need to find a way to add homonyms after the fact, but first I need to drop fossil taxa from the taxonomy...
Unfortunately, GBIF does not mark taxa as extant or extinct in their taxon spine export products.
However, I have found a way to wrangle OpenRefine to fetch data for each row from paleobiodb.
It's just really slow, but nevertheless progressing:
When paleobiodb data is fetched for each taxon, I can parse the resulting json to mark the different rows as extant or not. This way we can leave out fossil taxa.
value.parseJson().records[0].get('ext')
Homonyms are now accommodated by the app, but we need to test whether these actually get through post-processing and into Specify via WorkBench. @jlegind is tasked with testing this.
A test dataset was created with taxon names plus author name:
Delphinium bucharicum Popov
Delphinium carela Buch.-Ham. ex D.Don
Legouixia Van Heurck & Müll.Arg.
Legousia snogerupii Biel & Kit Tan
The author name was not processed in the GREL script part of post processing which means that only the binomial was transferred to test Specify. Example: https://specify-test.science.ku.dk/specify/view/collectionobject/4369407/
A solution would be to add an 'Author' column to the Specimen table. This would enable mapping to author name in Workbench.
@jlegind Can I see the post-processed file of this dataset?
And the pre-processed too, so I can attempt to replicate?
Although not in itself related to the GREL script, it does make sense, however, that we need the author field in the specimen table so we can map that value in Workbench.
I did not notice that @jlegind created a new ticket #476 for the specific issue that I just fixed within the scope of this ticket.
I will try to tie these tickets together somehow.
Tickets will be closed and can be reopened if necessary depending on testing results.
To be tested with pre-release: https://github.com/NHMDenmark/Mass-Digitizer/releases/tag/v1.1.26
NOTE: We have not yet considered the author name of any new taxa...
Comments on "Author names not carried over in the post processed file (GREL)" https://github.com/NHMDenmark/Mass-Digitizer/issues/476#issuecomment-1931523421
Superseded by #476
Template for isues/tickets in DigiApp
What is the issue ?
Why is it needed/relevant ?
Estimate level of effort required.
What is the expected acceptable result.
What could be the challenges ?
What test are required ?
New tests/Could include reference to the existing test
What documentation required?
Could refer to existing documentation and changes in relevant doc files.
Remarks