Closed Archilegt closed 2 years ago
Related: Names of subgenera don't get parsed if subgen. is included in the scientific name value gnames/gnparser#232 recognizing "species group" or "species complex" suffixes as indicators of infrageneric groupings gnames/gnparser#55
Synergic with: Use "mihi" to enhance scientific name finding and parsing gnames/gnparser#230
Example Julus (Parastenophyllum) Verhoeff, 1899 [original name] https://myriatrix.myspecies.info/myriatrix/julus-parastenophyllum
Original string: Gatt. Julus, Untergatt. Parastenophyllum mihi
Source: https://www.biodiversitylibrary.org/page/15115029
Remarks: Name strings “Julus” and “Parastenophyllum” are recognized. The styling of the subgenus name in the paper is really bad when compared to that of subgenus Julus (Leptoiulus) on page 199.
Suggested recognition:
Gatt.
acts as a starter #optional
Untergatt.
acts as a starter and/or connector #could be read and used to generate a field subgenus: Parastenophyllum
mihi
acts as terminator #recommended
Suggested result: Recognized name to be shown in "Scientific Names on this Page" box: Julus (Parastenophyllum)
Original string: Gatt. Julus, Untergatt. Parastenophyllum mihi
#similar to comment.
Normalization to canonical form:
short version: Parastenophyllum
full version: Julus (Parastenophyllum)
#Parentheses are important here as per article 6.1 of the ZooCode.
If this "German issue" is implemented, we can definitely include it in the Verhoeff paper GNA module.
I wouls say this is also closer to gnfinder
realm. I will move this issue there.
I did run the search for Untergattung
through all BHL corpus and found that the word happens quite rare and quite often is not connected to immediate scientific name. A check for the word would significantly decrease efficiency of the seach. Such minor improvements accumulating with time would slow down gnfinder to a halt and make it useless for large data processing.
In case of mihi
: we would check for it only if we already know something is a scientific name, so it wont change performance significantly.
Anchor words like Untrgattung
will be important for NLP analysis to weed out false positives when a scientific word is ambivalent like Cancer
or America
.
Searching in BHL’s full text for “Untergattung” retrieves 8675 publications and searching for “Untergatt.” retrieves 541 publications [22.02.2022]. https://www.biodiversitylibrary.org/search?stype=F&searchTerm=Untergattung#/titles https://www.biodiversitylibrary.org/search?stype=F&searchTerm=Untergatt.#/titles I don't know how to visualize total hits in the corpus.