gnames / gnparser

GNparser normalises scientific names and extracts their semantic elements.
MIT License
39 stars 4 forks source link

As a User, I prefer to have "f." parsed as "forma", rather than "filius" #154

Closed dimus closed 3 years ago

dimus commented 3 years ago

See https://github.com/gnames/gnparser/issues/147 for the discussion about this issue.

In ambiguous cases, f. is much more likely to be forma than filius. Therefore we should continue to issue a warning but parse names like the ones mentioned in discussion with @havardo as forma:

Sanguinaria canadensis L. f. multiplex (E.H.Wilson) Weath.
Rosa banksiae R.Br. f. lutescens Voss
Prunus cerasifera Ehrh. f. stipitata Bregadze
Cupressus obtusa (Siebold & Zucc.) F.Muell. f. formosana (Hayata) Clinton-Baker
dimus commented 3 years ago

This issue's code is creating ~15%-20% slowdown sadly, but I think it is important enough to justify it.

havardo commented 3 years ago

Auch. I assume/hope the slowdown is only when hitting "f." and not an overall performance hit? Thank you for your efforts trying to resolving this :)

dimus commented 3 years ago

It all depends on the structure of https://github.com/gnames/gnparser/blob/master/ent/parser/grammar.peg

I suspect it is possible to decouple treatment of maybe filius from the rest. At some point I will try to optimize the file.