Closed dimus closed 1 year ago
Hm, Genus (Subgenus)
should work according to these tests:
https://github.com/gnames/gnparser/blob/master/testdata/test_data.md#combination-of-two-uninomials
Name: Aaleniella (Danocythere)
Canonical: Aaleniella subgen. Danocythere
Name: Cordia (Adans.) Kuntze sect. Salimori
Canonical: Cordia sect. Salimori
Name: Calathus (Lindrothius) KURNAKOV 1961
Canonical: Calathus subgen. Lindrothius
Can you add examples that show your cases?
Can you please show examples for worries about "Author in Author, Year"
Looks like I need to add "dem" as an author word: Von dem Busch
. Ill check if dem
ever happens as a specific epithet.
@dimus, sorry, I wasn't paying attention to this issue. The "Genus (Subgenus)" and "Author in Author, Year" cases I was thinking of can be found in in https://github.com/gnames/gnames/files/12587991/regex_OK_gnparser_no.txt. Both forms throw up a quality rating of 2.
Please also note that in "Eutrochatella babei (Arango y Molina, 1876)", the "y" is part of the author's surname, so the quality 2 indicator "Spanish 'y' is used instead of '&'" does not apply.
Thank you @Mesibov for explanation. I do think that y
should decrease the quality, because there are many other languages that people can use for the and
word, and doing so will create a mess. So I decided to limit and
words to and
and &
. I personally would prefer et
though :)
I am not sure what to do if y
is a part of the Author name, I guess I do need to put exceptions and hardcode such authors into gnparser.
In case of Genus (Subgenus)
and Author in Author
the quality is decreased after discussion with Paddy Patterson about these two issues. For botanical names 'Author in Author' is actually valid, so I am on the fence about it. For Genus (Subgenus)
I can double check with ICZN folks.
I did try to address most of the problems in v1.7.5
(1) One problem is that gnparser adds quotes when I use the TSV output option. Originals in the Naturalis Mollusca list, followed by the gnparser output:
"""Glyptothauma"" cf ankasana" | """""""Glyptothauma"""" cf ankasana""" """Glyptothauma"" cf. ankasana" | """""""Glyptothauma"""" cf. ankasana""" """Glyptothauma"" cf. ankasana de Winter, 1996" | """""""Glyptothauma"""" cf. ankasana de Winter, 1996""" """Glyptothauma"" sp. 2" | """""""Glyptothauma"""" sp. 2""" "Sepietta oweniana (D""Orbigny, 1839-1841)" | """Sepietta oweniana (D""""Orbigny, 1839-1841)""" "Sepiola atlantica D""Orbigny, 1839-1842" | """Sepiola atlantica D""""Orbigny, 1839-1842""" """Triphora"" osclausum Rolán & Fernández-Garcés, 1995" | """""""Triphora"""" osclausum Rolán & Fernández-Garcés, 1995"""
(2) Another issue is that "D'Orbigny" in the original is "D’Orbigny" in the gnparser output. Why change UTF-8 27 to e2 80 99?
(3) regex says reject, gnparser says OK (regex_yes_gnparser_no file)
Please see. A lot of these end with "cf/CF" or "ms/MS".
(4) regex says OK, gnparser rejects (regex_OK_gnparser_no file)
Please see. It looks like gnparser doesn't like "Genus (Subgenus)", which I would have thought OK, and worries about "Author in Author, Year". Note also that the Dutch-persons at Naturalis have used "Von dem Busch" rather than "von dem Busch".
regex_OK_gnparser_no.txt regex_yes_gnparser_OK.txt