Performance Optimizations

do-me commented 7 months ago

This issue is grouping a few things that might be optimized to improve performance:

Faster dot product possible, see here: https://github.com/xenova/transformers.js/pull/667
Add binary quantization for incredible performance gains: https://github.com/xenova/transformers.js/releases/tag/2.17.0. This should enable saving up to 3000 books in one index. In my last tests it was crashing for 1000 but working perfectly for 100 books. Context1, context2, example

tbc

varunneal commented 6 months ago

I'll take a look at the dot product today. Have you seen the FAISS library at all? Some of this algorithms might be able to offer terrific implementations for clustering

https://github.com/facebookresearch/faiss

do-me commented 6 months ago

Sure! What algorithm are you referring to particularly?

For dimensionality reduction, I was considering t-SNE, PCA and UMAP. They all have pros and cons with PCA being great and widely used efficient algorithm, UMAP being computation-intense (as it's just projecting the point, no expensive iterations required) and t-SNE afaik being computation-heaviest algorithm but usually generating "visually pleasing" results. From what I can see it's the most promising algorithm at the moment. Just yesterday the author of https://github.com/Lv-291/wasm-bhtsne released a new version which supports multithreading, so once updated, that should lead to a huge speedup.

If you're referring to the general logic or the DB-like json objects backbone holding the chunks and embeddings, I already looked a little into JS vector DBs like Orama. However, as I didn't get any answer on my question about performance yet I didn't run any tests yet. From an application perspective that might make much more sense to have the app and DB/data more cleanly separated, also offering easier imports/exports etc. but I definitely do not want to compromise on performance. There is also other (vector DB) projects like DuckDB (https://github.com/duckdb/duckdb-wasm) but it might not be mature enough yet. If you find anything that looks worth trying, we could give it a go!

Also, for a while I had the idea in mind to allow for connections to external/local vector DBs like Qdrant. That way the web app would be the inferencing interface and the memory intense processes would run somewhere else allowing for really scalable apps! The setup would be optional of course and work like the Ollama connection.

do-me commented 6 months ago

fyi: there is also https://github.com/tantaraio/voy, Rust-based wasm DB as alternative to Orama. However seems a little dead?

Seeing all these projects I think we created a pretty solid "vector DB" ourselves as part of SemanticFinder. Makes me wonder whether it might be worth to extract the logic... like a lean, no-fuzz JS-native JSON-based DB.

varunneal commented 6 months ago

I agree a lean database seems quite suitable. It would be nice to make a local-storage JS library with "guarantees" like limits on the total memory that can be used.

FAISS has HNSW (super fast approx nearest neighbor) and also newer and faster fast approx nearest neighbor algorithms. Here is an overview..

do-me commented 1 month ago

Here's another hot candidate for crazy speed improvements on the indexing side: static models with model2vec: https://github.com/MinishLab/model2vec/issues/75. Curious how to run this in JS.

do-me commented 1 month ago

Fyi: lancedb seems like the best file-based vector DB out there (https://github.com/lancedb/lancedb), similar to sqlite-vec but with more functionality (full-text search etc.). Seems superior to voy and is also written in Rust. Might be charming to be able to export the whole lancedb and be able to connect other frontends to it.

do-me / SemanticFinder

Performance Optimizations #49