gbif / name-parser

The core GBIF scientific name parser library
Apache License 2.0
17 stars 4 forks source link

wilg smal #26

Closed damianooldoni closed 6 years ago

damianooldoni commented 6 years ago

It seems name-parser considers wilg smal as type SCIENTIFIC. This is quite strange, I think, because the search API tool doesn't return anything as expected: https://www.gbif.org/species/search?q=wilg%20smal. Could you please have a look? Am I missing something? Thanks for the support.

mdoering commented 6 years ago

With the latest change that allows for all lower case binomials the parser should assume its a scientific name "Wilg smal". See https://github.com/gbif/portal-feedback/issues/1379 and https://github.com/gbif/name-parser/commit/65597179dcb8c45d57dd7684e17b179b59bb0685

mdoering commented 6 years ago

the search API does a lookup into our index, the parser just parses according to syntactical rules

damianooldoni commented 6 years ago

Thank you very much, @mdoering !

peterdesmet commented 6 years ago

@mdoering, discussed this with @timrobertson100 Since this introduces more free rules, he suggested to add a flag, where one could e.g. include “Linnean” to use a more restrictive set of rules. In the future, the flag could be expended to parse according to ICZN, etc. Can you reopen this issue?

mdoering commented 6 years ago

what API are we talking about, name matching or parsing? A parsing flag Linnean=true should do what exactly differently?

peterdesmet commented 6 years ago

name parsing. Linnean flag expects that a scientific name starts with a capital letter. I can't think of any other expectations for the moment.

mdoering commented 6 years ago

Do I assume correctly that you would expect a name parsing exception if its not capital? I am rather considering a "strict" flag that would exclude some of the rather strong cleanup/normalization procedures. Alternatively we could also indicate various applied cleanup methods in the response so you know if the case has been altered etc. I think I actually prefer that. We already have a warning set we could use: https://github.com/gbif/name-parser/blob/master/name-parser-api/src/main/java/org/gbif/nameparser/api/ParsedName.java#L152

Currently populated with these: https://github.com/gbif/name-parser/blob/master/name-parser/src/main/java/org/gbif/nameparser/Warnings.java#L6

peterdesmet commented 6 years ago

Indicating that the case has been altered in the cleanup would be nice indeed... and then no extra flag is needed.

When using the name parser as a kind of validation (and not cleaning) service, we were basically surprised to get SCIENTIFIC back for a name starting with a small letter. Reading from https://github.com/gbif/portal-feedback/issues/1379, such names are NOT scientific names, but the name parser is just kind enough to parse them anyway.