Closed tobymarsden closed 2 years ago
Good catch @tobymarsden. I would like to have this rule more strict.
I checked gnverifier names
with ripgrep
: rg "^([\p{L}]+-[\p{L}]+){2,}.*?\b" all-names-2021-11-14.txt
and it looks like there is nothing reasonable with more than 2 dashes, and only these genera seem to be 'real enough` (with various capitalizations):
Iulo-eido-coprolites
Johnson-sea-linkia
Para-bary-thelphusa
Para-lio-thelphusa
Para-peri-thelphusa
Prunus-lauro-cerasus
Tsugo-piceo-picea
I see nothing useful with 3 or more dashes.
Searching with rg "\b[a-z]([a-z]*-[a-z]*){2,}.*?\b" all-names-2021-11-14.txt
gives quite a few 2-dash specific epithets, and there are even a few that seem to be real when I search for 3 dashes or more with rg "\b[a-z]([a-z]*-[a-z]*){3,}.*?\b" all-names-2021-11-14.txt
~~So I am on a fence about this one. It seems that allowing up to 2 dashes would keep most of false positives unparsed, but also would ignore 2 epithets that have more than 2 dashes. Let me talk to our botanists and zoologists on Monday.~~
I recalled that we did have this conversation about epithets already with out taxonomists, and, as a result, multi-dashes are allowed. So I think for genera it makes sense to limit them to 2 dashes for now, and if necessity arises, allow for multi-dashes. What do you think @tobymarsden?
@dimus Thanks for the explanation! And on your weekend, too. I've updated the PR to accept up to two dashes for genera.
Looks good @tobymarsden I am going to add a couple of more tests after merge
@dimus Amazing - thanks! Now that Kew parses I'll check World Flora 😂
Parsing fails with the genus
Prunus-lauro-cerasus
. Though this is a synonym, it does appear in the literature so parsing would be helpful, and I can't see any prohibitions in the ICBN against more than one hyphen in a genus name.