Closed mdoering closed 5 years ago
@dimus could you explain to me why parsing Lühea vulgaris
yields a quality warning "Non-standard characters in canonical", but Isoëtes vulgaris
does not?
IMHO the warning (or not) should be the same.
Compare
ICZN 32.5.2.1. In the case of a diacritic or other mark, the mark concerned is deleted, except that in a name published before 1985 and based upon a German word, the umlaut sign is deleted from a vowel and the letter "e" is to be inserted after that vowel (if there is any doubt that the name is based upon a German word, it is to be so treated).
The reason ë
does not generate a warning, is that it is the only diacritic (sadly) not prohibited by botanical code:
60.6. Diacritical signs are not used in scientific names. When names (either new or old) are drawn from words in which such signs appear, the signs are to be suppressed with the necessary transcription of the letters so modified; for example ä, ö, ü become, respectively, ae, oe, ue; é, è, ê become e; ñ becomes n; ø becomes oe; å becomes ao. The diaeresis, indicating that a vowel is to be pronounced separately from the preceding vowel (as in Cephaëlis, Isoëtes), is a phonetic device that is not considered to alter the spelling; as such, its use is optional. The ligatures -æ- and -œ-, indicating that the letters are pronounced together, are to be replaced by the separate letters -ae- and -oe-.
There is no good solution for this, because to parse a particular name correctly the paraser would need
to know the year when name was created, the code, the origin etc. So the only way i see is to do it consistently, generating the least amount or errors. Therefore in Go parser I removed ë
as a special case. It will break a very few botanical cases, but will fix many botanical and zoological cases.
Closing this ticket here, opening https://gitlab.com/gogna/gnparser/issues/48
Not all diacritic marks should be simply removed according to the codes. Some, most prominently the German Umlauts, should be transliterated. See ICNafp article 60
For example the genus
Lühea
should be spelledLuehea
in the canonical name