gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
38 stars 4 forks source link

More "nudum" names which don't parse correctly #207

Closed tobymarsden closed 2 years ago

tobymarsden commented 2 years ago

Here is a list of 119 names from World Flora Online and Plants of the World Online containing the epithet "nudum" which don't parse correctly. I'm happy to submit a PR which adds them to the existing list of "nudum" exceptions, if that is the best approach?

dimus commented 2 years ago

Thanks @tobymarsden, great list. I see that some of the names are infraspecific, that means the solution I have need to be extended for them, before all names were species. PR would be fantastic, or I can get to it at some point this week.

tobymarsden commented 2 years ago

@dimus Sure thing; I'll work on a PR in the coming days.

I'm also seeing a bunch more names with non- epithets not parsing correctly; is there scope for changing the regex to only match non where it's not followed by a dash? Or is that likely to catch detritus that should be moved to the tail?

(I realize that golang's regex engine doesn't support negative lookaheads so this may not be sensibly possible anyway, though I'd look into it. Maybe even just non\s?)

tobymarsden commented 2 years ago

@dimus sorry, one more thing. Would changing the nudum regex to only match nomen\s+nudum be too restrictive? I couldn't find any references to nudum without nomen in front but it wasn't an exhaustive search...