asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 58 forks source link

Computing vector distance functions (like vss_inner_product) incorrectly parses the data #122

Open zen0wu opened 4 months ago

zen0wu commented 4 months ago

I'm dumping some raw bytes into a vector column, but when sqlite-vss parses the BLOB into a vector, it checks if it starts with a v\x01 and if so, these two bytes are treated a header.

The problem is, I have some vectors actually do start with this header (as part of the data) and now it fails to parse the data correctly.

https://github.com/asg017/sqlite-vss/blob/8fc44301843029a13a474d1f292378485e1fdd62/src/sqlite-vector.cpp#L68-L76

One thing I can do is to prepend the header to every row, but that feels a really bad solution and it would be great if we can fix this.

asg017 commented 4 months ago

Try vector_from_raw() instead. The vector_from_blob() function was a poor attempt at a new vector format, but it made things more complex for no reason (like you saw). The vector_form_raw() on the other hand should handle raw blobs correctly (ie 4 bytes per float vectors).

I'll likely deprecate vector_from_blob() in the next release