gbv / jskos-server

Web service to access JSKOS data
https://coli-conc.gbv.de/api/
MIT License
6 stars 4 forks source link

Support search in vocabulary metadata #121

Closed nichtich closed 3 years ago

nichtich commented 3 years ago

Both /suggest and /search endpoint search in concepts but for BARTOC we want to search in vocabularies. How about:

stefandesu commented 3 years ago

As long as this doesn't cause problems for the already existing type parameter for /suggest and /search (used for example in DANTE and the GND API), we could implement this.

stefandesu commented 3 years ago

Alternatively, we could add /voc/search and /voc/suggest instead.

stefandesu commented 3 years ago

@nichtich Which fields should be searchable? Due to the nature of JSKOS fields (i.e. one key per language), we can't simply add an index for those fields, but rather have to, every time when a vocabulary is added or changed, create an extra field and have an index on that field (like we do for concepts).

If possible, we should generalize the way we do it for concepts and apply it to schemes as well. What we're currently doing there is:

Then, after having a set of results, those results will go through a custom scoring algorithm to determine the best order. This works very well in my opinion, but gets slow if the Mongo result has too many matches (can happen in RVK for example). If we only have a few thousand schemes anyway, this shouldn't be an issue though.

Should I try to apply the same search implementation from concepts to schemes as well?

nichtich commented 3 years ago

This sounds well. Most important is to be able to find a concept scheme by words in its name or abstract. Ranking will not be perfect but ok, this would require a text retrieval engine with support for more sophisticated search features such as drilldown.

stefandesu commented 3 years ago

I added a first implementation and some tests. Since there are a lot of files that changed, it would be good if you could take a look, @nichtich. If you're trying it out with BARTOC, don't forgot to reimport the schemes and rebuild the indexes (./bin/import.js --indexes). The indexes need to be part of the new import script (#101), and maybe we should have endpoints to create indexes as well. 🤔

stefandesu commented 3 years ago

Also, is there a need to indicate via /status that /voc/search and /voc/suggest exist? DANTE does not have these endpoints, so maybe having a way to determine this would be good.

nichtich commented 3 years ago

Also, is there a need to indicate via /status that /voc/search and /voc/suggest exist?

Yes, all endpoints are explicitly listed via /status (which might later be extended to Swagger #23)

stefandesu commented 3 years ago

Yes, all endpoints are explicitly listed via /status (which might later be extended to Swagger #23)

Question: How should these endpoints be listed there? Currently, the properties do not necessarily represent the endpoint path (i.e. property schemes -> /voc, property top -> /voc/top).

stefandesu commented 3 years ago

Question: How should these endpoints be listed there? Currently, the properties do not necessarily represent the endpoint path (i.e. property schemes -> /voc, property top -> /voc/top).

Still an open question @nichtich.

nichtich commented 3 years ago

How about voc-search and voc-suggest?