asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Changing pointer semantics to std::unique_ptr #58

Closed polterguy closed 1 year ago

polterguy commented 1 year ago

Changed all occurrences of vector<float> * to std::unique_ptr<vector<float>> to avoid memory leaks.

polterguy commented 1 year ago

Just throw away my previous pull request. This is much cleaner and less hassle and does exactly what it claims to do ... :)

polterguy commented 1 year ago

Well, at least now it's dropping. This is from our Kubernetes cluster on a deployment with one POD having the lib and roughly 15,000 vectors from OpenAI's text-embedding-ada-002 (1,500+dimensions). The peak is after a delete operation.

Before my fix it didn't even drop at all after deleting 8,000 indexes. Now at least it goes down. I'll work more on this in the weekend, but already I suspect 75% of the leaks are mostly gone. Which for us (running in a Kubernetes cluster) is kind of a big deal ...

I'll weed out the rest of the leaks tomorrow or Sunday. If somebody with more knowledge about SQLite's API can run through that part of the code it would be nice. However, I suspect you did a good job on this part. It was only the C++ parts that was a little bit "challenging" ...

Screenshot 2023-06-09 at 8 55 44 PM
polterguy commented 1 year ago

This last drop is after a re-index operation (deleting everything in vss_ virtual table and re-creating records) - Still some work to be done, bu at least now it is dropping (which it didn't do at all before I started) ...

Screenshot 2023-06-09 at 9 15 47 PM
asg017 commented 1 year ago

Thank you @polterguy for this PR! I'll be taking a closer look soon.

One thing I want to call out: memory when writing to a vss0 table (either inserting or deleting) gets a little wonky. Write only happen on a COMMIT, in which the following occurs:

All to say: Any INSERT/DELETE on a vss0 table will copy the Faiss index in-memory , temporarily until it's safely saved into the sqlite DB.

This is mostly due to the limitations of Faiss's write_index method. It can be improved with a custom IOWriter, which I'm tracking in #1

But in general, yeah sqlite-vss and sqilte-vector has some significant memory problem. I'll make a "tracking" issue for this shortly.

asg017 commented 1 year ago

Tracking in #59

asg017 commented 1 year ago

@polterguy let me know when you'd like me to review this!

polterguy commented 1 year ago

Thx, give me a couple of more days, I'm still trying to wrap my head around SQLite C/++ API ... :/

asg017 commented 1 year ago

No problem, let me know if you need any help with any parts of that.

In general, small quick PRs > larger more encompassing changes, so even if you just wanna merge a few std::unique_ptr changes I'd be more than happy to do so

polterguy commented 1 year ago

You OK if I apply some consistent coding standard, and some slightly longer variable names? I'll be a bit "dramatic", but I'll try to keep the spirit of the library ...

asg017 commented 1 year ago

Longer variable names and improved coding standards would be fine! Once this PR is merged I'll most likely go back and update a few things, including adding clang-format for formatting

polterguy commented 1 year ago

Just letting you know I haven't flunked on you. Creating a PR tomorrow. I've 100% perfectly stabilised memory usage, plugged some 10 to 20 leaks, apprx. 98% of the leaks are gone in my code base now, and code is higher quality, better naming conventions, and coding std (used clan-format) ...

eelco commented 1 year ago

@polterguy Appreciate it!