asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Sharing faiss::Index across multiple connections #64

Open polterguy opened 1 year ago

polterguy commented 1 year ago

The way I've understood the lib, it's reading the same faiss::Index from the database's shadow table every single time a connection is created. This of course has dramatic repercussions for multi user environments (the web?), due to the size of the index.

Would it be a good idea to have some sort of shared faiss::Index between multiple connections, such that the index is only read once and shared for multiple connections, and stays in memory? This would require some sort of thread synchronisation objects of course, particularly on updates/writes - But I suspect it would bring down the memory footprint, and increase the general usability of the library by a lot.

Suggestions ...?

I can do it if you like the idea ...

asg017 commented 1 year ago

Would love to explore this! I'm not too familiar with sharing memory between multiple threads/connections, but I'm sure we can make something work.

SQLite thread-safety will definitely be important to keep in mind. I imagine sqlite3_result_pointer and those related functions might be helpful as well? Do you have any good docs/blogs/books I can read up on to learn ore about shared memory between threads in C++?

To make things easier, I wonder if it's worthwhile to separate "read-only" vss0 connections and "read-write" connections. If it makes it easier to have some special read-only mode (ex a sqlite3_vss_read_only entrypoint) so multiple threads can read a shared faiss::Index, then I wouldn't mind including that. Multiple readers + 1 writer is essentially a SQLite limitation as well, so I wouldn't mind leaning into it a bit.