asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Support WASM compilation #12

Open klavinski opened 1 year ago

klavinski commented 1 year ago

This extension would be great to enable vector search in the browser. Is there a guide to add it to the WASM build? I tried studying sqlite-lines, unsuccessfully.

asg017 commented 1 year ago

Would like to see it too, but will be hard. Any SQLite extension in WASM is complicated, as you saw in sqlite-lines. Additionally, since sqlite-vss relies on Faiss, I'd imagine there's even more hurdles we'd have to jump through, in addition to this being written in C++ (which probably isn't too big of an issue, but foriegn to me as I've only successfully compiled SQLite extensions to WASM with plain C extensions before).

Also, there's a bunch of different SQLite WASM targets now, each of which are slightly incompatible with each other. There's the official SQLite WASM build, sql.js, and probably a few more I don't know about.

I'm open for contributions that give it a shot, but if anyone reading this would like to give it a shot, please comment here with your approach before sending a PR. Additionally, if anyone wants to sponsor this work I'd be more than happy to talk about it, if you have a clear goal in mind!

klavinski commented 1 year ago

The official WASM build makes it easier to implement an extension. In my case, I settled on adding this one. I copied the code of the extension into the file ext/wasm/extra_init.c, then followed the official steps:

./configure --enable-all
make sqlite3.c
cd ext/wasm
make

This produces the .wasm and .js files with the extension enabled.

kroggen commented 1 year ago

An option would be to have a version that does not depend on Faiss (separate branch?)

The HNSW algo is relatively simple, and there are some libraries like hnswlib

klavinski commented 1 year ago

Some have successfully compiled such algorithms to WASM.

asg017 commented 1 year ago

I copied the code of the extension into the file ext/wasm/extra_init.c, then followed the official steps:

Thanks for pointing out ext/wasm/extra_init.c! Seems like building for SQLite's WASM build is much easier than sql.js, at least since the last time I tried.

It's still be difficult for sqlite-vss however, since Faiss is such a heavy and tricky-to-compile dependency. I haven't found any examples of Faiss being compiled to WASM. But @kroggen that hnswlib library may be a solution: I originally looked at that lib when building sqlite-vss, but chose Faiss since it had way more indexing options and flexible storage.

I don't think adding hnswlib to sqlite-vss would be easy to do, and I'd rather sqlite-vss stay with Faiss for now. However, I can totally see a new sqlite-hnsw project that uses hnswlib instead, and has a similar APIsqlite-vss but without a few bells and whistles. Plus, since it's header only, it'll probably be very easy to compile to WASM.

I don't have the capacity now to start a new sqlite-hnsw project, but if anyone reading this wants to give it a try, would be more than happy to help!

asg017 commented 1 year ago

Some have successfully compiled such algorithms to WASM.

I also looked at hora when building sqlite-vss, which would've worked with sqlite-loadable-rs, but it seemed inactive and I couldn't find any nice APIs to serialize an index to a buffer. Also sqlite-loadable-rs is great for simple table functions and virtual tables, but isn't great at shadow tables yet, so it would've been a lot of work to implement. Also, building a SQLite extension in Rust and compiling it to WASM is incredibly difficult (maybe impossible?)

jlarmstrongiv commented 11 months ago

Just found https://github.com/jiggy-ai/hnsqlite/, but they don’t have a wasm build

klavinski commented 11 months ago

I did not update this issue, but for those still looking for a solution, I successfully used a combination of hnswlib, which stores the embeddings in IndexedDB, and SQLite for the rest.

limcheekin commented 10 months ago

I did not update this issue, but for those still looking for a solution, I successfully used a combination of hnswlib, which stores the embeddings in IndexedDB, and SQLite for the rest.

Appreciate if you could share the solution. Any public URL?

Thanks.

klavinski commented 9 months ago

Using hnswlib-wasm is straightforward, except for tuning the parameters. This is the best explanation I have found.

Today, I have discovered another web vector database with persistent storage: Victor. It uses OPFS instead of IndexedDB.

limcheekin commented 9 months ago

I found another one which look promising and timely, it just get 1.0.0 released few days ago, the SurrealDB.

It support the following features according to the docs:

After losing some hairs in the past 2 days :), I finally make the surrealdb.wasm works with indxdb with simple test of a vector function today.