gbif / hosted-portals

Support material for establishing the GBIF Hosted Portals
Apache License 2.0
10 stars 6 forks source link

Only suggest taxa with occurrences #158

Open jholetschek opened 3 years ago

jholetschek commented 3 years ago

The suggestion lists (e.g. for scientific name search or recorded By) seem to be based on the whole GBIF index and not on the subset of records available in the portal. So selecting values from the list often gives 0 results. This is very confusing for users, some even considered this a bug.

I guess this is a difficult issue - but is there a way to only list values for the occurrences to be found in the portal?

MortenHofft commented 3 years ago

RecordedBy, waterbody etc

For some values I believe doing so would be the better solution. Such as recordedBy. This discussion is ongoing in https://github.com/gbif/hosted-portals/issues/136

Scientific name

For scientific name, then I'm not sure I agree. For what it is worth, then GBIF.org suggests all known taxa in the backbone, though there might be zero observations. It could just as well be confusing to not see the taxa you know to exist. "Did I spell it wrong? Do they not know it? Has it been renamed?". It is also possible to search for countries in which there is no data. If my site was scoped by countries, then it would be meaningful to restrict the options to that list though (not possibly currently btw).

1 Taxonomically scoped I can see it would be valuable to taxonomically scope it though. Say only search for taxa with plants. Or within Fabaceae. I'm not sure how to do so well, but I can imagine both proper solutions and semi-functional hacks that would work okay (e.g. ask for many 200 results and filter them client side)

2 Use another checklists Another option could be to use another checklist than the backbone. But I haven't thought through what that would mean for occurrences. The results would need to be mapped to the backbone as that is how they are organised in search. That would require the site owner to find or publish a checklist with the data they are interested in.

3 Use list of distinct taxa in the index This would be the same as the solution discussed in the wildcard issue. But would likely perform worse and I'm not sure it is what a user expects either. But it might be worth exploring as well.

That wasn't conclusive, but just to say that it isn't obvious to me that it is a better solution to only show taxa that have data.

jholetschek commented 3 years ago

I can see your point. I think both things could happen to a user - they might be confused because a scientific name doesn't show up in the suggestion list, or they might find it strange to be allowed to select a name that doesn't yield any results.

I guess this should be discussed in a wider circle, either within GBIF - or maybe in the next online meeting? Really not sure.

MortenHofft commented 3 years ago

It is tricky. I think this is also a case of familiarity. I assume the user testing base is used to this site http://vh.gbif.de/vh/search/units/advancedSearch/

And that site provides counts in some cases. But if I for example search for Denmark and then do a taxon search then I will get suggestions for taxa without results. So in some sense they are willing to accept that a suggestion yields no results.

That said I think it would be great if we could have counts on all suggestions. So that users know up front if selecting an option would yield results. But how to do so in an intuitive way that also performs isn't trivial to me. Notice that a behaviour such as your user group know it from http://vh.gbif.de/vh/search/units/advancedSearch/ could be achieved with option 2 above: using another checklist (which you would maintain and that covers all taxa in the network).

MortenHofft commented 3 years ago

I can think of 3 approaches

  1. Show all names known in the backbone (or some other chosen checklist)
  2. Only show taxa with data in the portal if no other filters are applied
  3. Only show taxa with data within the current filter (If I have added country filter for Sweden, then only show taxa with occurrences in Sweden). And ignore whatever filters are already in place for this field (else selecting fungi first will mean that you cannot then select plants as there are no plants in the result set of fungi).

On GBIF.org we typically show all meaning values even if there is no data. E.g. we show a list of all countries, though there might not be any data in that country.

I'm not sure how to do it. I do not like to present the users with options that are dead ends, but it is also feels wrong to only show 180 countries, because the rest has no data. If you are interested in one of those "empty" countries, then seeing the name in the list gives me confidence that it isn't just forgotten. Similar for taxa.

Perhaps what we need is a "restrict suggestions to current result set" button (so combining 1+3)

tucotuco commented 3 years ago

My feeling is that it is important NOT to limit the options by any criterion whatsoever. It is important to allow people to get zero results on queries. That is meaningful in itself and I expect people would be less confused by that than by limited options. Not limiting options is what I expect people would expect.

jholetschek commented 3 years ago

Concerning http://vh.gbif.de/vh/search/units/advancedSearch: The suggestion lists are based on the specimens in the portal, e.g. for scientific name. If you have several filters set, the counts will disappear in the lists, but the items stay the same. And I wouldn't expect that in the hosted portals - for the lists to sort of "preview" the results.

I can see your points! I think it really depends on the filter type. I absolutely agree when it's the country - missing countries would be confusing. But when you have a herbarium portal, getting zoological names could be confusing all the same.

Is there a way to filter the used checklist on the fly? I.e. only showing taxa that are children of one or several taxon-IDs (like plants and algae, for example)?

MortenHofft commented 3 years ago

I agree @jholetschek that it makes sense to at least scope the result set for your portal as well as taxonomically scoped portals. It isn't possible now, but I would like to allow selecting a different checklist or adding a filter to the results. And the standard suggest endpoint might suffice for your usecase as that allows kingdom to be set. So you could perhaps (once possible) set that to plantae and fungi?

MortenHofft commented 3 years ago

The problem with filtering on the fly (in the browser or some itermediate layer) is that you might not get any results. E.g. I search for the plant name starting with abc against a general endpoint with animals included. I get the results abc1, abc2, abc3, abc4, abc5, ...abc100. but abc1-99 is animals. So I will never see my plantae suggestion. Unless I fetch 100s of suggestions. And then the number of suggestions shown to the user will appear random. And at worst even be empty. So post filtering is not ideal. And not performant either. But it might be the best we can do in some cases.

That is why I'm dull and suggest checklists or dedicated endpoints instead.

MortenHofft commented 3 years ago

But let me see what I can do with a suggest driven from the index. It might be better than expected both in terms of performance and experience. Either way it needs to be configurable as the needs are clearly different

jholetschek commented 3 years ago

I agree @jholetschek that it makes sense to at least scope the result set for your portal as well as taxonomically scoped portals. It isn't possible now, but I would like to allow selecting a different checklist or adding a filter to the results. And the standard suggest endpoint might suffice for your usecase as that allows kingdom to be set. So you could perhaps (once possible) set that to plantae and fungi?

Yes, restricting to one or two kingdoms would go a long way!

jholetschek commented 3 years ago

I raised the issue of suggestions lists that only show values from the portal vs. full lists in our team, and there was a vivid discussion about this, covering the whole spectrum. To combine both worlds, three suggestions have been made (even though I'm not sure they're feasible):

tucotuco commented 3 years ago

Any of those options would be frosting on the cake for us.

On Tue, Jun 8, 2021 at 2:13 PM Jörg Holetschek @.***> wrote:

I raised the issue of suggestions lists that only show values from the portal vs. full lists in our team, and there was a vivid discussion about this, covering the whole spectrum. To combine both worlds, three suggestions have been made (even though I'm not sure they're feasible):

  • Allow to switch between both (would affect all suggestion lists)
  • Always show the full list, but grey out the items with no occurrences in the portal.
  • Show the full list and append the number of records in paranthesis. Personally, I'd vote for the second or third option.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/gbif/hosted-portals/issues/158#issuecomment-856946787, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADQ7252F5XKUPSYJRJPL7TTRZFVDANCNFSM444LAI3Q .

MortenHofft commented 3 years ago

An autocomplete/suggest like we have now needs to be fast to be useful. And to gray out empty or add counts, then we need to do an additional query per suggestion (a search for results with that taxon within the portal scope) - I'm sceptical of how that would perform I'm afraid. Those counts would have to be fired in succession showing one after another with a second delay.

Perhaps the solution is to do something akin to what we have for identifiedBy and other wildcard type searches. The big difference is that it does a search, and then let the user choose. It is slower both to navigate and to calculate the suggestions, but it might be a better option in this case - at least for some users.

jholetschek commented 3 years ago

I'm well ware that this is difficult and maybe not feasible. But it struck me how different user expectations can be about this behaviour, and they and might get frustrated easily and not accept the portal if they don't understand what the lists show.