Closed Archilegt closed 2 years ago
Searching gnverifier database got 20 names with mihi:
Anisochaeta kiwi mihi Blakemore 2012
Aeolesthes inhirsutus mihi
Bruchus nongoniermani Mihi,
Anisochaeta kiwi mihi
Anisochaeta kiwi mihi Blakemore, 2013
Chyphononyx simulator mihi
Chimila tinguana mihi
Cobosidea mihi
Lithobius leostygis mihi
Conferva geminata var. mihi
Eucyclops serrulatus mihi
Hypomyces chrysospermus f. edulis-mihi K. Bitner 1953
Conferva geminata var. mihi Schwabe
Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966
Lithobius (Polybothrus) leostygis subsp. mihi
Quexua alinella mihi
Lithobius (Polyrbothrus) caesar subsp. mihi
Odonthophagus var. c mihi
Scutella agassizi mihi
Trochus patholatus mihi
Looks like mihi
word has several meanings:
Conferva geminata var. mihi
Conferva geminata var. mihi Schwabe
AlgaeBase
Eukaryota unassigned phylum|Eukaryota unassigned class|Eukaryota unassigned order||Conferva|Conferva geminata mihi
Conferva geminata var. mihi Schwabe
Conferva geminata var. mihi Schwabe
AlgaeBase
Eukaryota unassigned phylum|Eukaryota unassigned class|Eukaryota unassigned order|Conferva|Conferva geminata mihi
Eucyclops serrulatus mihi
Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966
Catalogue of Life
Biota|Animalia|Arthropoda|Hexanauplia|Copepoda|Neocopepoda|Podoplea|Cyclopoida|Cyclopida|Cyclopidae|Eucyclops|Eucyclops serrulatus serrulatus|Eucyclops serrulatus mihi
Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966
Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966
Catalogue of Life
Biota|Animalia|Arthropoda|Hexanauplia|Copepoda|Neocopepoda|Podoplea|Cyclopoida|Cyclopida|Cyclopidae|Eucyclops|Eucyclops serrulatus serrulatus|Eucyclops serrulatus mihi
Aeolesthes inhirsutus mihi
Aeolesthes inhirsutus mihi
EOL
Chyphononyx simulator mihi
Chyphononyx simulator mihi
EOL
Chimila tinguana mihi
Chimila tinguana mihi
EOL,
Quexua alinella mihi
Quexua alinella mihi
EOL
Cobosidea mihi
Cobosidea mihi
ION
Odonthophagus var. c mihi
Odonthophagus
ION
Scutella agassizi mihi
Scutella agassizi mihi
ION
Trochus patholatus mihi
Trochus patholatus mihi
ION
Lithobius leostygis mihi
Lithobius (Polybothrus) leostygis subsp. mihi
Plazi
Lithobius (Polybothrus) leostygis subsp. mihi
Lithobius (Polybothrus) leostygis subsp. mihi
Plazi
Lithobius (Polyrbothrus) caesar subsp. mihi
Lithobius (Polyrbothrus) caesar subsp. mihi
Plazi
Hypomyces chrysospermus f. edulis-mihi K. Bitner 1953
Hypomyces chrysospermus f. edulis-mihi K. Bitner 1953
Union 4
|Cellular life|Eukaryota|Opisthokonts|Fungi|Fungi|Ascomycota|Sordariomycetes|Hypocreales|Hypocreaceae|Hypomyces|Hypomyces chrysospermus edulis-mihi
Anisochaeta kiwi mihi Blakemore 2012
Anisochaeta kiwi mihi Blakemore, 2013
WoRMS
Biota|Animalia|Annelida|Clitellata|Oligochaeta|Crassiclitellata|Megascolecida|Megascolecidae|Anisochaeta|Anisochaeta kiwi|Anisochaeta kiwi mihi
Anisochaeta kiwi mihi
Anisochaeta kiwi mihi Blakemore 2013
WoRMS
Biota|Animalia|Annelida|Clitellata|Oligochaeta|Crassiclitellata|Megascolecida|Megascolecidae|Anisochaeta|Anisochaeta kiwi|Anisochaeta kiwi mihi
Anisochaeta kiwi mihi Blakemore, 2013
Anisochaeta kiwi mihi Blakemore, 2013
WoRMS
Biota|Animalia|Annelida|Clitellata|Oligochaeta|Crassiclitellata|Megascolecida|Megascolecidae|Anisochaeta|Anisochaeta kiwi|Anisochaeta kiwi mihi
Bruchus nongoniermani Mihi
Bruchus nongoniermani Mihi
uBio NameBank
Bruchus nongoniermani
I dont worry about Union, uBio, ION, and EOL, they are not human-curated, but AlgaeBase, CoL and WoRMS seem to have names with legitimate use of mihi
as epithets. So parser should take at least these names as exceptions to the rule
Many thanks, Dima! Good to know that if "mihi" is applied, it may give "false positives" in a very small subset of names, compared to the "true positives" for which it does represent a terminal element.
Name deduplication: I believe that for the sake of counting potentially affected names, the 20 name instances that you found can be deduplicated down to 15, as follows:
Deduplicated list of names:
Names by Plazi:
Scientific name: Lithobius (Polyrbothrus) caesar mihi https://tb.plazi.org/GgServer/html/299583C14F747A72E86065049FDE3C22 A misspelling for Polybothrus, plus a digitization artifact which should not have included "mihi". Published string is spelled and styled correctly, as "4. Lithobius (Polybothrus) caesar mihi." See https://www.biodiversitylibrary.org/page/13294205
Scientific name: Lithobius (Polybothrus) leostygis subsp. mihi https://tb.plazi.org/GgServer/html/CCEB9C62C87766E980DD858BC13468C8 A digitization artifact which should not have included "mihi". Published string is styled correctly, as "1. Lithobius (Polyhothrus) leostygis mihi". See See https://www.biodiversitylibrary.org/page/13294201
Scientific name: Lithobius leostygis mihi This instance points to the one above and I could not find a URL for it.
Result: The three (two when deduplicated) scientific name instances contributed by Plazi are false-positive digitization artifacts, including a misspelling.
Deduplicated list of names v.2 (Plazi names cleared):
Anomaly: The name "Odonthophagus var. c mihi", coming from ION has so many anomalies that it seems irrelevant to GNA for name finding. Source anomalies: The generic name is given as both "Onthophagus" (https://www.biodiversitylibrary.org/page/8222096) and "Odonthophagus" (https://www.biodiversitylibrary.org/page/8221999) in the "Enumeratio Insectorum Norvegicorum. Fasciculus ii." which ION points to. Additionally, it is not a scientific name in itself, e.g., it is the name of a variety designated by a single letter. Digitization anomalies: Name digitized with the genus "Odonthophagus" instead of ""Onthophagus". Name not including the specific epithet, supposedly "fracticornis", to which "var. c" is to be ascribed. The "mihi" seems to be a false positive, added by the recorder, as it is not a text string in the referred publication.
Overall, the name can be considered a false positive for mihi and can be deleted from the list.
Deduplicated list of names v.3:
@dimus, could someone check the "algal" and fungal names for you, so that we can know if they are true or false positives? A copy of the original publication would be desirable.
Word mihi
happens 192254 times in BHL
Conferva geminata var. mihi Schwabe: https://verifier.globalnames.org/?capitalize=on&format=html&names=Conferva+geminata+var.+mihi+Schwabe https://www.algaebase.org/search/species/detail/?species_id=93703
edulus-mihi
is not a problem, so I do not worry about it
"Conferva geminata var. mihi Schwabe" may be hard to match. The combination is uncurated in AlgaeBase and there is no guarantee that it is an original combination. There are no recorded references for that combination. The original combination may be Oscillatoria geminata Schwabe. When searched for that combination and author, AlgaeBase returns "Oscillatoria geminata Schwabe ex Gomont 1892" (https://www.algaebase.org/search/species/detail/?species_id=51094), which is also not the original treatment. The original treatment for Oscillatoria geminata Schwabe can be found at: Linnaea 11 (1), year 1837 Page 118: https://www.biodiversitylibrary.org/page/35312749 Tab. 1, Fig. 7: https://www.biodiversitylibrary.org/page/35313360
Confirming whether these are two combinations of the same name and whether the "mihi" is an artifact would require consulting with specialists familiar with the historical literature on Conferva and Oscillatoria. However, that is likely the case, as the author matches and there are currently combinations under both genera for a few species.
So my understanding is that really we have only these known exceptions for the parsing rule:
Anisochaeta kiwi mihi Blakemore 2012
Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966
Aeolesthes inhirsutus mihi seems another false positive. The name string "Aeolesthes inhirsutus subsp. mihi M.Matsushita, 1932" is deleted from GBIF (https://www.gbif.org/species/8885942). The string may have reached GBIF via JBIF (Japan). See entry for holotype of "Aeolesthes inhirsutus subsp. mihi M.Matsushita, 1932" at https://www.gbif.jp/gbif_search/detail?id=1_sehu-cole_urn:catalog:SEHU:COLE:0000000191
About "Eucyclops serrulatus mihi Dussart, Graf & Husson, 1966" Dussart, Bernard; François Graf; and Roger Husson. 1966. Les Crustacés du réservoir de la Fontaine des Suisses à Dijon. International Journal of Speleology, 2: 269-281. http://dx.doi.org/10.5038/1827-806X.2.3.2
The "author" is only Dussart, as he is the sole responsible for Copepoda in that publication. The name string "Eucyclops serrulatus var. mihi" is apparently styled correctly (pages 270 and 278). However, this is a printing artifact which became a database artifact. Dussart stated on pp. 270-271 (translated): "The differences existing between these two forms are not sufficient to give a name to the variety with the spine of P5 slender. I need only mention its existence...". Also, as per the first edition of the International Code of Zoological Nomenclature (1961), "Article 15. Names published after 1960. — After 1960, a new name proposed conditionally, or one proposed explicitly as the name of a "variety" or "form" [Art. 45e], is not available." (https://www.biodiversitylibrary.org/page/34584570). This further points at an unnamed form by Dussart (1966), the "mihi" in this case also being a false positive that does not need to be added to the exceptions, at least from the nomenclatural point of view.
Hmm, looks like situation is even more interesting with mihi
:
https://www.biodiversitylibrary.org/item/181042#page/535/mode/1up
Characium obovatum mihi. b. var. longipes mihi
I wonder if a better approach to mihi
is to ignore it, instead of considering it the end of a name. But for gnfinder
the use of
mihi
as a name terminator word might work.
Thank you @Archilegt for interesting information aboutEucyclops serrulatus mihi
, I'll pass it along to CoL guys. Do I understand correctly, that in zoology old names with var.
or f.
sometimes are promoted to subspecies rank? I would still add Eucyclops serrulatus mihi
as an exception, because parser is not a nomenclatural authority and deals with data on a lexical level.
Hi @dimus I reported the issue with E. s. mihi to T. Chad Walter (https://www.marinespecies.org/copepoda/index.php) on 13.vi.2022 but I did not receive a reply. Maybe the COL will be able to reach him or someone else. Thanks!
Hi @dimus
The case of Characium obovatum mihi. b. var. longipes mihi
(https://www.biodiversitylibrary.org/page/47100016) is interesting. There you don't have one name but two. The string would be parsed by a human reader as:
Tab. VII.
Fig. 3. Characium obovatum mihi
Fig. 3b
. Characium obovatum var. longipes mihi
where "b" is not part of the name but the explanation of an illustration (https://www.biodiversitylibrary.org/page/47100082).
The two mihi are indeed to be parsed as terminators but the first one could be also recognized as a connector. Detecting and reconstructing two names and recognizing a "b" as a figure indication might be too much to ask from a parser and could be left to a layer of annotations.
For strings less complex (e.g., without the "b") and containing two mihi
, where Genus specificEpithet mihi [var., f.] subspecificEpithet mihi
the parsing would be:
if 2 mihi,
parse mihi 1,
connect specificEpithet to subspecificEpithet,
terminate before mihi 2
"...in zoology old names with var. or f. sometimes are promoted to subspecies rank?" Yes, you are correct. The ZooCode has article "45. The species group", where article "45.5. Infrasubspecific names." The references therein will guide you to other articles.
Hi @dimus Shall we keep this issue open for some preliminary reporting on improved parsing? Or shall we do that via email or GoogleDocs? It would be great to have some stats on the actual improvement of the parser! :D
I do not have yet b. var.
as a possible rank (not yet sure how common it is, to justify adding it to parsing). The parsing of Characium obovatum mihi. var. longipes mihi
is now Characium obovatum var. longipes
:
https://github.com/gnames/gnparser/blob/master/testdata/test_data.md#names-with-mihi
I think it is reasonable enough to close the ticket for now, especially because the parser does not deal with names that happen in biological texts, and it is extremely rare to have mihi
in prepared lists of names.
If more concerns will appear about mihi
we can make a new ticket and link it with this one.
Dima, please note that b. var.
is not a rank.
b
refers to figure 3b
var.
is a rank
Ah thank you for spotting it @Archilegt!
Dima, please note that
b. var.
is not a rank.b
refers to figure 3bvar.
is a rank
Making gnfinder ticket about it https://github.com/gnames/gnfinder/issues/125
Ok. If the parsing of Characium obovatum mihi. var. longipes mihi
is now Characium obovatum var. longipes
, we can mention it as a special case of limitation of the parser, in which one string representing two names (one species, one subspecies) is parsed only to the subspecific name. We don't have to solve all the parsing problems in this round. ;-)
@Archilegt, do you think it makes better sense to parse Characium obovatum mihi. var. longipes mihi
as Characium obovatum
with var. longipes mihi
as an unparseable tail? The parser does assume that a string must have only one name.
I tend to think about this string as an indication of implicit authorship in two places, kind of similar to Aus bus L. cus K.
"do you think it makes better sense to parse Characium obovatum mihi. var. longipes mihi as Characium obovatum with var. longipes mihi as an unparseable tail? The parser does assume that a string must have only one name." No, I think that when choosing among two name strings, one should aim at retrieving the longest and most informative string along with the shortest unparseable tail. As it is now.
"I tend to think about this string as an indication of implicit authorship in two places, kind of similar to Aus bus L. cus K."
Yes, that would be the case for Characium obovatum mihi. var. longipes mihi
.
However, here we have Fig 3. Characium obovatum mihi
. b. var. longipes mihi
In an ideal world, the parser would:
Fig.
#langEn or Abb.
#langDE followed by Arabic or Roman numerals ranking higher than scientificName
. If Fig.
or Abb.
and numerals are detected, parse accordingly and wrap the whole string or substrings as explanationOfFigure
#ordered letters
where #a can be ommitted
and scoring letters higher if they are #letters enclosed by periods
. Wrap resulting explanationOfSubfigure
.explanationOfSubfigure
, with allowed values for single words specificEpithet
and subspecificEpithet
. Increase posterior score for explanationOfSubfigure
wrappers if mihi
terminators or authorName
co-occur with periods of #ordered letters
.explanationOfSubfigure
values b to z if single word values specificEpithet
and subspecificEpithet
exist. Match subspecificEpithet
to nearest anterior specificEpithet
, match both to nearest anterior genus
in order to assemble scientificName
.Example for Fig 3. Characium obovatum mihi
. b. var. longipes mihi
:
<explanationOfFigure>Fig 3. Characium obovatum mihi. b. var. longipes mihi</explanationOfFigure>
#langEn #numeralArabic
<explanationOfFigure>Fig 3.
<explanationOfsubfigure>Characium obovatum mihi.</explanationOfsubfigure>
#aOmmitted #wrapperScore = 0.25
<explanationOfsubfigure>b. var. longipes mihi</explanationOfsubfigure>
#bFirstLetter #wrapperScore = 0.25
</explanationOfFigure>
<explanationOfFigure>Fig 3.
<explanationOfsubfigure>Characium obovatum mihi.</explanationOfsubfigure>
#mihi #wrapperPostScore = 0.50
<explanationOfsubfigure>b. var. longipes mihi</explanationOfsubfigure>
#mihi #wrapperPostScore = 0.50 #subspecificEpithet = true
</explanationOfFigure>
<explanationOfFigure>Fig 3.
<explanationOfsubfigure>Characium obovatum mihi.</explanationOfsubfigure>
#scientificName = Characium obovatum
<explanationOfsubfigure>b. var. longipes mihi</explanationOfsubfigure>
#bFirstLetter #scientificNameAssembled = Characium obovatum var. longipes
</explanationOfFigure>
Does it make sense?
I think what you say is more of a job for gnfinder
, because gnparser
is designed to work with lists of already processed scientific names like personal checklists, databases, already extracted names. Adding contraints on what gnparser
can do allows to decrease the number of false positives.
Lets say Characium obovatum mihi. b. var. longipes mihi
is in a database. Parser would return:
with lowest parsing quality 4 and 2 warnings: unparsed tail
and ignored annotation
, which would allow database or checklist curator to detect a problem, look at it and fix it by hand
{
"parsed": true,
"quality": 4,
"qualityWarnings": [
{
"quality": 4,
"warning": "Unparsed tail"
},
{
"quality": 3,
"warning": "Ignored annotation `mihi`"
}
],
"verbatim": "Characium obovatum mihi. b. var. longipes mihi",
"normalized": "Characium obovatum",
"canonical": {
"stemmed": "Characium obouat",
"simple": "Characium obovatum",
"full": "Characium obovatum"
},
"cardinality": 2,
"tail": " b. var. longipes mihi",
"details": {
"species": {
"genus": "Characium",
"species": "obovatum"
}
},
"words": [
{
"verbatim": "Characium",
"normalized": "Characium",
"wordType": "GENUS",
"start": 0,
"end": 9
},
{
"verbatim": "obovatum",
"normalized": "obovatum",
"wordType": "SPECIES",
"start": 10,
"end": 18
}
],
"id": "e65f7279-c3f1-5719-9058-a3c024719fde",
"parserVersion": "v1.6.7"
}
The Latin word "mihi" was used by authors when proposing new scientific names, with the meaning of "me". The word could be used as a marker for "scientific name ends here", and could enhance scientific name finding if coupled to "search for scientific name 1, 2, 3 words ahead". The word could also be used for adding "interpreted authorship" (author+date) to scientific names instances if coupled to the publication (book, article) metadata where the scientific name instance is matched, therefore potentially helping to disambiguate homonyms. A quick glance at the occurrence of the word in BHL: https://www.biodiversitylibrary.org/search?searchTerm=mihi&stype=F#/titles Maybe it would be worth trying at least the "scientific name ends here" suggestion? :)