gbif / name-parser

The core GBIF scientific name parser library
Apache License 2.0
17 stars 4 forks source link

add epithet blacklist #43

Closed mdoering closed 5 years ago

mdoering commented 5 years ago

Some english words get parsed as epithets which never exist as real epithets, e.g. the. Add a blacklist to the parser that avoids such words ever become epithets.

Not as easy as one thinks as the main parsing is one big regex and its match depends on submatches. So if "the" should not match, the regex needs to know that. Otherwise other parts like authors are wrongly matched too

mdoering commented 5 years ago

https://github.com/gbif/name-parser/commit/6ffd7e4a96ac0bd8e65fb45120fbdeada9e7831f