gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

Improve parsing of name strings when author names have lowercase prefixes #183

Closed KatjaSchulz closed 3 years ago

KatjaSchulz commented 3 years ago

You must have a list of such prefixes somewhere, because many author names with lowercase prefixes get parsed properly. But here are a few that currently lead to parsing failures:

delle Chiaje dos Santos ten Broeke ten Hove

Example name strings and gnparser CanonicalFull results:

Cellaria caulini delle Chiaje, 1841 > Cellaria caulini delle Cellepora imperati delle Chiaje, 1841 > Cellepora imperati delle Tremoctopus violaceus delle Chiaje, 1830 > Tremoctopus violaceus delle Physophora mirabilis delle Chiaje, 1841 > Physophora mirabilis delle Laevapex vazi dos Santos, 1989 > Laevapex vazi dos Periclimenaeus aurae dos Santos, Calado & Araújo, 2008 > Periclimenaeus aurae dos Acanthagrion egleri dos Santos, 1961 > Acanthagrion egleri dos Aspidosiphon (Paraspidosiphon) fischeri ten Broeke, 1925 > Aspidosiphon fischeri ten Hydroides bulbosa ten Hove, 1990 > Hydroides bulbosa ten Laminatubus ten Hove & Zibrowius, 1986 > Laminatubus ten Laminatubus alvini ten Hove & Zibrowius, 1986 > Laminatubus alvini ten Protis hydrothermica ten Hove & Zibrowius, 1986 > Protis hydrothermica ten Pseudovermilia conchata ten Hove, 1975 > Pseudovermilia conchata ten Pseudovermilia fuscostriata ten Hove, 1975 > Pseudovermilia fuscostriata ten Pseudovermilia holcopleura ten Hove, 1975 > Pseudovermilia holcopleura ten Pseudovermilia madracicola ten Hove, 1989 > Pseudovermilia madracicola ten Semivermilia ten Hove, 1975 > Semivermilia ten Spirobranchus polycerus augeneri ten Hove, 1970 > Spirobranchus polycerus augeneri ten

dimus commented 3 years ago

Thanks @KatjaSchulz, added them to the prefixes list