gnames / gnfinder

GNfinder finds scientific names in UTF8 texts, PDF files, MS Word/Excel documents, URLs etc.
MIT License
44 stars 5 forks source link

problem with some capitalized epiteths #123

Closed abubelinha closed 2 years ago

abubelinha commented 2 years ago

My source text contains some botanical names where the epithets are capitalized when they refer to people's or places' names. i.e. "Linaria Haenseleri Bss. et Reut." or "Euonymus Eeuropaeus L."

Because of that, gnfinder is detecting these names as genus: "Linaria", "Euonymus" Is there any way to improve this behaviour so it tries to lowercase the word after genus, just in case it matches some known species? (and if not, then fall back to genus)

dimus commented 2 years ago

yes, capitalized epithets are not supported. I tried to take them into account and they generated too many false positives

abubelinha commented 2 years ago

Thanks.

So I will try to lowercase myself the 1st letter after each cardinality-1 name returned, and then process the whole text again.

dimus commented 2 years ago

This issue might be fixed in some dinstant future when we are able to perform a second run to ehnance found generic names and detect capitalized sp. epithets.