Make sure that we fetch the entire vocabulary

MadCatX commented 10 months ago

A request for a vocabulary with no additional parameters may not return the entire vocabulary. To fix the problem, split the vocabulary fetch into two passes. First determine the number of items in the vocabulary and then fetch the entire vocabulary, using the number of items as the request parameter.

@mesemus I don't know if the backend is guaranteed to always return as many items as it is asked for if there is a hard cap. In the latter case the fetch procedure would have to be done in stages.

mesemus commented 10 months ago

There is a hard cap - opensearch will not return more than 10000 items - not even if you fetch those incrementally via pagination api.

The correct solution for this use case is imho to search for vocabulary entries via suggestion api (?suggest=abc) after user has written a couple of characters (or in case of hierarchies, allow user to navigate through a hierarchical tree with suggestion api as an option).

We also need to consider the usecase when the term sought is not in the local vocabulary and we need to fetch it from a remote service.

@edager we need to draw how the user interface & ux for this should look like and then decide on the implementation.

MadCatX commented 10 months ago

I've modified the code to make it able to deal with very large vocabularies:

The entire remote vocabulary is prefetched when the corresponding UI element is displayed.
If the vocabulary contains less hits than a predefined cap, it will be considered complete and the UI will not attempt to query the backend for any additional hits.
If the vocabulary is incomplete, the UI will query the backend for more hits if the current user input does not match to anything.

Things left to be done:

Whenever a user prompt triggers a remote lookup, the user is not allowed to type any further until the lookup completes. This might turn out to be annoying in practice. To fix that, the vocabulary lookups would need to be made cancellable. This is doable but somewhat non-trivial.
User input validation code currently cannot have any async functions in it. This could result in vocabulary inputs getting flagged as invalid because the validator can only check against entries that are present in the local cache.
The Search component from Semantic UI is not particularly good.
The complete/incomplete logic will have to be revised if the remote vocabulary itself will query other remote vocabularies for additional hits.

edager commented 9 months ago

The new version works as expected, however I'm beginning to wonder if the simplest solution wouldn't be to search online every time, and then store the vocabulary upon deposition (checking if already present)? It does make us more depended on the foreign API's, but does allow for a single workflow rather than the users having to try two things. Switching between local and external api should be our problem, and by implementing a system where the users has understand this concept as well as try both external and local we're making it the users problem. @mesemus @MadCatX what do you think?

edager commented 9 months ago

I hit the wrong button...

Molecular-Biophysics-Database / mbdb-input-ui

Make sure that we fetch the entire vocabulary #95