gbv / jskos-server

Web service to access JSKOS data
https://coli-conc.gbv.de/api/
MIT License
6 stars 4 forks source link

Consider Typesense as a search/suggest backend #188

Open stefandesu opened 1 year ago

stefandesu commented 1 year ago

I finally spent some time looking into improvements to the search backend (see #43) and among the ones I have tried, Typesense did an exceptionally good job. It has native support for infix search (enabled on a per-field basis) and is incredibly fast. Also the search results in my tests have always been very good. Importing the whole RVK with almost 800k concepts takes about a minute on my local machine, and results in a memory usage of only ~300 MB (note that only ~150k of those concepts are fully indexed for search; the others are combined concepts where I only indexed the identifier and notation) with full infix-enabled search on all identifiers, labels, and notes. Most search queries take less than 100 ms (which could probably be improved with better configuration and not enabling infix search on notes).

There is a test script here: https://github.com/stefandesu/minisearch-test/blob/typesense/typesense.js

We could also use this for efficient creator search in mappings (#170) and even for combining search with filtering (#144).

Right now, I'm not sure how to best integrate Typesense with jskos-server. Ideally, I would offer it as kind of an alternative search backend, but without requiring it for full functionality, but that would increase complexity a lot. If we replaced our current search backend (queries to the MongoDB) with Typesense, we would reduce code complexity, but add a dependency without which jskos-server would not work. I guess we need to evaluate pros and cons for this.

stefandesu commented 1 year ago

Some helpful links:

stefandesu commented 1 year ago

I think the main issue is keeping the Typesense collection in sync with the data in MongoDB (if we only use Typesense to enhance search). There is a tool that makes it easier, but for us to be able to use it, we would need nested indexed fields which is currently in the Typesense release candidate (see https://github.com/typesense/typesense/issues/227).