Closed tobymarsden closed 2 years ago
I think you are totally right about nudum
. Not sure why did I pick nudum
by itself as a terminator, it was a mistake. Howerver nomen nudum
is not always in this form, in the wild there is also "nom. nudum", "nom.nudum", however, I suspect they would cause unparsed tail anyway.
non
is a more complicated case. I suspect it can actually be quite useful to be parsed, for example in cases like
Xiphipops fisheri (non Snyder, 1904)
Can you remove non
cases from the PR? I think they require their own issue and more thought
@dimus Yes, you're right! In fact swapping in nomen\s+nudum
for nudum
does nothing because nomen
is a stopword anyway, so I've removed it entirely.
This PR is now nudum
only, and I'll open an issue for non
. Thanks!
I present this for discussion -- it may be hopelessly naïve, but restricting the preprocessing of
non
to insteadnon
andnudum
tonomen nudum
allows us to remove the special casing of e.g.Hyacinthoides non-scripta
,Stilifolium nudum
etc and not add well over a hundred more.I like the elegance here but if it's going to also parse a huge chunk of junk I am of course very happy to add add special cases instead...