GlobalNamesArchitecture / gni

Global Names Index
http://wiki.github.com/GlobalNamesArchitecture/gni
22 stars 2 forks source link

resolve_once behavior #32

Open sckott opened 10 years ago

sckott commented 10 years ago

Curious if resolve_once behavior is correct e.g., this call http://resolver.globalnames.org/name_resolvers.json?names=Plantago+major&resolve_once=true returns more than one match (all from different sources I think, but more than 1 match)

The API docs suggest setting resolve_once=TRUE should just get first match.

dimus commented 10 years ago

I guess the option name is a bit confusing, the idea behind it was to avoid name parsing and return only exact matches if possible. In your example only exact matches (from several data sources) are returned, so parsing and matching by canonical form did not happen. It is faster than running name parser, but also removes quite a few results. Because of that resolve once is disabled by default. Are you interested in getting one result only?

You will see the difference if you change resolve_once=true to resolve_once=false

sckott commented 10 years ago

Thanks for your quick response! Hmm, I guess just getting one result doesn't make too much sense, so no, I don't think that's needed, and none of the users of my software ask for it.

I'll change the documentation in my software so that the resolve_once parameter is described more accurately.

sckott commented 10 years ago

Hi again @dimus - Actually, a user just asked about possibly returning just one match for each queried taxon name. Is that possible? It seems a bit tricky to do so, and may require a few possible choices. For example, if the parameter is called return_one, then could pick at random from a set of equivalent names (return_one=random), or pick from preferred data source (return_one=12, 12 for EOL), or other options?

We could do this on our side in R, but of course it make for faster data return times if it is done on your side.

dimus commented 10 years ago

oups, missed your new comment. There is not yet documented way to do something like that:

http://resolver.globalnames.org/name_resolvers.json?names=Plantago+major&best_match_only=true&data_source_ids=12

if no data_source_ids are given -- all of them will be used

In addition it is possible to add preferred_data_sources to best_match only. If no data_source_ids are given 'best match' will come from any data source. In addition if there is a match in the 'preffered data source' it will also be returned.

http://resolver.globalnames.org/name_resolvers.json?names=Plantago+major&best_match_only=true&preferred_data_sources=12|4

for now only BHL uses this functionality. If you will start to use it I will give it 'official' status and document it on the API page

sckott commented 10 years ago

Great, thanks for this, I'll include these two parameters in my taxize R package. I don't know how much people will use them - I'm sure at least some will.

tucotuco commented 9 years ago

Looking forward to official status. This will be immensely useful for data improvement workflows in general. VertNet gives a "+1".