Open KatjaSchulz opened 5 months ago
thank you @KatjaSchulz for catching this, I am not sure yet how to fix this, because in many cases Aus (Bus)
does mean Aus subgen Bus.
Do names like this happen for bonaty names specifically?
Yes, this is a tricky one. All the examples I found were taxa under the botanical code, except for the Cyrtolophosidiidae (Schew.) example which is a really weird one that has since been removed from COL.
One approach to fix this could be a blacklist of strings that can never be interpreted as subgenus names. I think it's pretty safe to put the author strings above on that list. But after digging some more, I also found this name: Sigmoidotropis (Piper) A.Delgado. I don't think there are any subgenera named Piper, but I don't know if I would be comfortable putting that name on the blacklist.
Another approach would be to add processing of rank information to gnparser. I usually have that information for most names I am trying to parse, and I use it to double-check the gnparser results. I realize that would probably be quite a bit of work to implement.
Anyway, here are a few more names I found in the COL 2024 annual archive:
Plant genera;
Hexaphylla (Klokov) P.Caputo & Del Guacchio – simple: Klokov – full: Hexaphylla subgen. Klokov Parogonum (Haraldson) Desjardins & J. P. Bailey – simple: Haraldson – full: Parogonum subgen. Haraldson Ericetorum (Jermy) Li Bing Zhang & X. M. Zhou – simple: Jermy – full: Ericetorum subgen. Jermy Archidasyphyllum (Cabrera) P. L. Ferreira, Saavedra & Groppo – simple: Cabrera – full: Archidasyphyllum subgen. Cabrera Lamyropsis (Kharadze) Dittrich – simple: Kharadze – full: Lamyropsis subgen. Kharadze Sigmoidotropis (Piper) A.Delgado – simple: Piper – full: Sigmoidotropis subgen. Piper Moquiniastrum (Cabrera) G. Sancho – simple: Cabrera – full: Moquiniastrum subgen. Cabrera
Chromista genera:
Hormosira (Endlichter) Meneghini, 1838 – simple: Endlichter – full: Hormosira subgen. Endlichter Syracolithus (Kamptner) Deflandre in Grassé, 1952 – simple: Kamptner – full: Syracolithus subgen. Kamptner
I do have a list of Botanical genera authors (https://github.com/gnames/gnparser/blob/master/io/dict/data/genera_auth_icn.txt), and, if they are not ambiguous, I treat the author-matching text in parentheses after genus for bi- trinomials as authorship. I can expand this rule to uninomials as well.
This is pretty close to your suggestion @KatjaSchulz, as I understood it
@KatjaSchulz would implementation of #267 help for your use case? If all names are botanical, we would not have ambiguity in parsing such names
Yes, I think so. Since I am usually running comprehensive data sets through gnparser, it would be a little bit more work to separate names by code, but it would be feasible. There may be lingering problems with some microorganisms, but I think those would be negligible. Thanks!
Ups, did not mean to close this one, reopening...
Some plant names are now recognized, some still have problems, and Chromista authors are not recognized yet.
There is a new option: code
. It allows to force names to be parsed by ICN rules:
https://parser.globalnames.org/api/Hormosira%20(Endlichter)%20Meneghini,%201838?code=bot
Supported values: bact
, bacterial
, ICNP
, bot
,
botanical
, ICN
, cult
, cultivar
, ICNCP
, zoo
, zoological
, ICZN
.
These are all valid/accepted names from the current version of the Catalogue of Life
Plant genera Nassella (Trin.) É.Desv. – simple: Trin. – full: Nassella subgen. Trin. Dacrycarpus (Endl.) de Laub. – simple: Endl. – full: Dacrycarpus subgen. Endl. Lysiphyllum (Benth.) de Wit – simple: Benth. – full: Lysiphyllum subgen. Benth. Tricholemma (Röser) Röser – simple: Roeser – full: Tricholemma subgen. Roeser Isogonium (Kützing) de Bary – simple: Kuetzing – full: Isogonium subgen. Kuetzing Euptilota (Kützing) Kützing, 1849 – simple: Kuetzing – full: Euptilota subgen. Kuetzing Setiechinopsis (Backeb.) de Haas – simple: Backeb. – full: Setiechinopsis subgen. Backeb.
Chromista genera Cyclotella (Kützing) de Brebisson – simple: Kuetzing – full: Cyclotella subgen. Kuetzing Tabularia (Kützing) Williams & Round – simple: Kuetzing – full: Tabularia subgen. Kuetzing Cyrtolophosis (Schew.) – simple: Schew. – full: Cyrtolophosis subgen. Schew. Pyrocystis (Schütt) Lemmermann, 1899 – simple: Schuett – full: Pyrocystis subgen. Schuett
Chromista families Anaulaceae (Schütt) Lemmermann – simple: Schuett – full: Anaulaceae subgen. Schuett Triceratiaceae (Schütt) Lemmermann – simple: Schuett – full: Triceratiaceae subgen. Schuett Pyxillaceae (Schütt) Simonsen – simple: Schuett – full: Pyxillaceae subgen. Schuett Pyrocystaceae (Schütt) Lemmermann, 1899 – simple: Schuett – full: Pyrocystaceae subgen. Schuett Aulacodiscaceae (Schütt) Lemmermann – simple: Schuett – full: Aulacodiscaceae subgen. Schuett Stictodiscaceae (Schütt) Simonsen – simple: Schuett – full: Stictodiscaceae subgen. Schuett Lauderiaceae (Schütt) Lemmermann – simple: Schuett – full: Lauderiaceae subgen. Schuett
Protozoa family Cyrtolophosidiidae (Schew.) – simple: Schew. – full: Cyrtolophosidiidae subgen. Schew.