gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
6 stars 1 forks source link

search gives unpredictable results #104

Closed MortenHofft closed 2 years ago

MortenHofft commented 2 years ago

minimum 3 letters required?

https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts?limit=100&q=in&locale=en

will give no results despite a search for int will. Since the vocabularies are relatively small, at least in this case, then I suggest we make it less strict and just have it return a lot of values even if users only enter a single letter.

NB: we currently use search as suggest doesn't really work https://github.com/gbif/vocabulary/issues/102

MortenHofft commented 2 years ago

I just realise that it isn't a minimum number of letters issue. A search for n for example returns introduced. But a search for indoes not.

https://api.gbif.org/v1/vocabularies/EstablishmentMeans/concepts?limit=100&q=n&locale=en

MortenHofft commented 2 years ago

From private chat I understand the the issue here is that 'in' is a stop word. Since stop words are language specific and secondly that the size of these vocabularies are so relative small, would it then make sense to not use stop words? Or will that produce really bad results? I'm afraid I do not have the experience to tell how it would work.

marcos-lg commented 2 years ago

@MortenHofft I can try to do a version without stop words and we test it. I think it will affect mostly to phrase queries but probably in the vocabularies they are not very common queries.

marcos-lg commented 2 years ago

I've deployed a version without stop words in DEV. I'm still testing it but it looks better:

https://api.gbif-dev.org/v1/vocabularies/EstablishmentMeans/concepts?limit=100&q=in

marcos-lg commented 2 years ago

Deployed to PROD.