Fulltext support - Githubissues

novoj commented 1 year ago

Fulltext is one of the key requirements for the e-commerce catalogs. But is also super-hard to implement. Currently our string based search is sub-optimal and doesn't use no specialized index. We have some spike tests for using Radix tree, which provides quite good results we could start with. But the path to a full-text engine is very hard and demanding.

We should also investigate path for embedding the Lucene engine - although this would complicate our transaction handling and also the planned replication over the cluster.

Another path which could be investigated is semantic search using LLM embeddings which might cover gap in our keyword fulltext search capabilities. But this is also very hard to implement (all current implementations use HSNW index for this kind of search).

novoj commented 1 year ago

Tip for exploration: https://github.com/benldr/JPruningRadixTrie

novoj commented 1 year ago

To evaluate: https://foojay.io/today/jvector-1-0/#VectorSearch

novoj commented 11 months ago

To read: https://db.in.tum.de/~leis/papers/ART.pdf

novoj commented 5 months ago

Again: https://github.com/jbellis/jvector

FgForrest / evitaDB

Fulltext support #258