gbif / vocabulary

A simple registry of controlled vocabularies used for terms found in GBIF mediated data.
Apache License 2.0
5 stars 1 forks source link

Concept suggest #140

Open MortenHofft opened 1 month ago

MortenHofft commented 1 month ago

The current suggest isn't useful. It doesn't include parents and it doesn't take language into consideration. related: https://github.com/gbif/vocabulary/issues/135 https://github.com/gbif/vocabulary/issues/113

imagine these concepts

[
{name: XY1, labels: {en: Mammals, da: Pattedyr, es: Mamífero}},
{name: XY2, labels: {en: Monkey, da: Abe, es: Mono}},
{name: ASD, labels: {en: Snake, es: Serpiente}},
{name: ERT},
]

I now have a site with danish as the language. I open my concept search to select a concept. What is the labels I see in a dropdown?

Suggestion

I'm not sure what the best experience would be, but here is a suggestion for something we could try.

The full list on the danish site would show (ordering and filtering aside)

Pattedyr # danish label
Pattedyr => Abe # danish label and the parent labels in danish (fallback english fallback name)
Snake # english is the predefined fallback when the danish label does not exists
ERT # we have to fall back to the raw concept name since there is not a danish or english label

How would suggest filtering work. I suggest that we only filter on values the user would actually see.

That means that if I search for q=mon&locale=da then I do not get the result for Monkey. Since the label the user will see is abe. If I search for q=ab&locale=da then I will get the result for monkey

If I search for q=er&locale=da I will get the result for ERT but not for Serpiente.

If I search for q=XY1&locale=da I will get no results.

I believe it would be confusing to show the result Monkey when the user search for mo on a danish site if the label the user will see is Abe

So essentially one could think of it this way: Generate the string name for each concept : danish label | english label | concept name (same for parents) That would yield

Pattedyr
Pattedyr Abe
Snake
ERT

And the filtering for any string (e.g. erwould be on that list, no other information). similarly if we decide to include descriptions in the search, then it would be on the value the user sees. On a danish site it would never match against english or spanish if there is a danish label/description

If I add multiple locales, then it will search on both labels. E.g. q=er&locale=da&locale=es would return both XY1(mamífERo), ASD(sERpiente) and ERT

Another approach

We could also say that you always search in the english labels and concept names in addition to whatever language you specify. That it makes it more powerful that you can search raw values and english. But my gut feelign would be that it is more confusing for most users. And above suggestion alows us to try both versions by providing multiple locales

MortenHofft commented 1 month ago

For the response format I would like it to be as small as possible so we do not send any unnecessary information over the wire - we want a suggest to be fast.. That means not to include langauge that wont be displayed. And not to include e.g. keys that cannot be used (see https://github.com/gbif/vocabulary/issues/136)

ManonGros commented 1 month ago

Thanks Morten! I think that makes sense. I agree that it would be confusing to have in the response terms that seem unrelated to what you are searching in your language.

marcos-lg commented 1 month ago

Snake # english is the predefined fallback when the danish label does not exists

This means that this search q=sna&locale=da should return the ASD concept?

MortenHofft commented 1 month ago

This means that this search q=sna&locale=da should return the ASD concept?

yes - since that is the label the user will see. For all the user knowssnake is the only name

MortenHofft commented 1 month ago

q= Mamífero could return 2 things.

Either only the Mamífero result Or it also search parent values. That was my initial suggestion (in conversations). I'm not entirely sure what provides the best experience honestly. I tend to think that parent labels should be included in search. but just have a lower ranking

so

q= Mamífero
[
{name: XY1, labels: {en: Mammals, da: Pattedyr, es: Mamífero}},
{name: XY2, labels: {en: Monkey, da: Abe, es: Mono}, parents: [...Mamífero...]},
]
marcos-lg commented 4 weeks ago

Deployed to production the first version that includes the fallbackLocale and limit params.

The latestRelease endpoint is cached.

CecSve commented 4 weeks ago

It is my understanding this has been fixed and implemented so will close the issue.

marcos-lg commented 4 weeks ago

It was actually a first version that didn't include the latest suggestion included in this comment https://github.com/gbif/vocabulary/issues/140#issuecomment-2173250169 so that's why I left it open.

But we might not even need it, it depends on the user experience we get. We can reopen it if needed.

CecSve commented 4 weeks ago

Ooops. Okay sorry - I'll open it again and see if there is any user feedback during this year. Otherwise I propose we close it again.