gbif / checklistbank

GBIF Checklist Bank
Apache License 2.0
31 stars 14 forks source link

species suggest #257

Open MortenHofft opened 1 year ago

MortenHofft commented 1 year ago

prompted by https://github.com/gbif/hp-land/issues/5

We have occasionally been asked to provide occurrence search by vernacular names.

I recently added an option to do so using species/search as a suggest. But 2 things could help to improve the results. It might or might not be simple to add:

mdoering commented 1 year ago

adding a language filter is rather simple, but partial matches are basically a new suggest search requiring different indices

MortenHofft commented 1 year ago

I suspected. Perhaps that is a better option. A suggest that supports vernacular names. And an option to prefer accepted. Let me try to specify it and come back. Then we can evaluate if it is worth the effort.

MortenHofft commented 3 months ago

q: animalia => returns doubtful genus first accepted names should be prioritized perhaps? https://api.gbif.org/v1/species/suggest?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=Animalia

q: fungi => the fungi kingdom is returned as result 41 Perfect matches should go first I would think https://api.gbif.org/v1/species/suggest?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=fungi&limit=41

q: sea horse => vernacular names aren't searchable Ideally this was attempted, even if it is sometimes incorrect, we could reference the source. And it could have lower priority than the scientific name https://api.gbif.org/v1/species/suggest?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=sea%20horse

q: Ginkgophyta => it is a synonym, but isn't telling me what the accepted name is I would like an option to get the acceptedKey and name and classification back and the synonym name a preferAccepted=true param or such https://api.gbif.org/v1/species/suggest?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=Ginkgophyta

I've made am experimental suggest service in graphql, but it is slow as it have to do a ton of requests to get accepted names, search vernacular names, filter them on language, etc.

It can be tried here https://graphql.gbif-staging.org/graphql?query=%0A%20%20query%7B%0A%20%20%20%20taxonSuggestions%28%20q%3A%20%22Ginkgophyta%22%2C%20language%3A%20eng%29%20%7B%0A%20%20%20%20%20%20key%0A%20%20%20%20%20%20scientificName%0A%20%20%20%20%20%20vernacularName%0A%20%20%20%20%20%20taxonomicStatus%0A%20%20%20%20%20%20acceptedNameOf%0A%20%20%20%20%20%20classification%20%7B%0A%20%20%20%20%20%20%20%20name%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%7D%20%20%20%20%20%20%20%20%0A%20%20

mdoering commented 3 months ago

All good ideas and great you have analyzed usage patterns. I am just unclear how the integration with CHecklistBank will work and would not want to maintain 2 code bases.

Maybe we can already use the CLB API for some things? https://api.checklistbank.org/dataset/53147/nameusage/suggest?q=Animalia

The usageId values are nub keys, just given as strings cause the identifiers are of any kind across all datasets.

mdoering commented 3 months ago

vernacular name search in CLB has not received much attention yet - but I am happy to change that. There is a separate API for that right now: https://api.checklistbank.org/dataset/53147/vernacular?q=Animal

with taxonID being the nubKey