Closed nichtich closed 3 years ago
As long as this doesn't cause problems for the already existing type
parameter for /suggest
and /search
(used for example in DANTE and the GND API), we could implement this.
Alternatively, we could add /voc/search
and /voc/suggest
instead.
@nichtich Which fields should be searchable? Due to the nature of JSKOS fields (i.e. one key per language), we can't simply add an index for those fields, but rather have to, every time when a vocabulary is added or changed, create an extra field and have an index on that field (like we do for concepts).
If possible, we should generalize the way we do it for concepts and apply it to schemes as well. What we're currently doing there is:
notation
is searchable by prefixes (i.e. when searching for "123", it'll show anything starting with "123")prefLabel
and altLabel
are searchable by prefixes and suffixes (i.e. searching for "Pädag" will return both "Pädagogische Soziologie" and "Sozialpädagogik")creator
, definition
, scopeNote
, and editorialNote
are combined in one array, put into a normal MongoDB text index, and searched by exact matches (i.e. if a concept has an editorialNote
with content "Bankbetriebslehre s. QK 300 Kapitalflussrechnung s. QP 828 Liquiditätstheorie s. QC 320", searching for "Kapitalfluss" will not return it, but searching for "Kapitalflussrechnung" will)Then, after having a set of results, those results will go through a custom scoring algorithm to determine the best order. This works very well in my opinion, but gets slow if the Mongo result has too many matches (can happen in RVK for example). If we only have a few thousand schemes anyway, this shouldn't be an issue though.
Should I try to apply the same search implementation from concepts to schemes as well?
This sounds well. Most important is to be able to find a concept scheme by words in its name or abstract. Ranking will not be perfect but ok, this would require a text retrieval engine with support for more sophisticated search features such as drilldown.
I added a first implementation and some tests. Since there are a lot of files that changed, it would be good if you could take a look, @nichtich. If you're trying it out with BARTOC, don't forgot to reimport the schemes and rebuild the indexes (./bin/import.js --indexes
). The indexes need to be part of the new import script (#101), and maybe we should have endpoints to create indexes as well. 🤔
Also, is there a need to indicate via /status
that /voc/search
and /voc/suggest
exist? DANTE does not have these endpoints, so maybe having a way to determine this would be good.
Also, is there a need to indicate via /status that /voc/search and /voc/suggest exist?
Yes, all endpoints are explicitly listed via /status
(which might later be extended to Swagger #23)
Yes, all endpoints are explicitly listed via
/status
(which might later be extended to Swagger #23)
Question: How should these endpoints be listed there? Currently, the properties do not necessarily represent the endpoint path (i.e. property schemes
-> /voc
, property top
-> /voc/top
).
Question: How should these endpoints be listed there? Currently, the properties do not necessarily represent the endpoint path (i.e. property
schemes
->/voc
, propertytop
->/voc/top
).
Still an open question @nichtich.
How about voc-search
and voc-suggest
?
Both /suggest and /search endpoint search in concepts but for BARTOC we want to search in vocabularies. How about:
type
to/suggest
and/search
with default value http://www.w3.org/2004/02/skos/core#Concept to select which item types to search in (concepts or vocabularies, the latter http://www.w3.org/2004/02/skos/core#ConceptScheme)