asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 58 forks source link

Error reconstructing vector - Does the column factory string end with IDMap2 #92

Closed radekosmulski closed 10 months ago

radekosmulski commented 10 months ago

I am using Python.

I create a vss0 table as follows:

con.execute("""
create virtual table vss_snippets using vss0(
  dragon_embedding(768) factory="Flat,IDMap2" metrict_type=INNER_PRODUCT
);
""")

I then populate it with embeddings (I unit normalize my vectors like so: faiss.normalize_L2(ctx_emb)):

con.execute("insert into vss_snippets(rowid, dragon_embedding) values (?, ?)", [row_id, ctx_emb[idx].astype(np.float32).tobytes()])

But when I try to pull anything out doing the following:

con.execute("""
select dragon_embedding from vss_snippets where rowid==1
""").fetchone()

I get an error:

---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
Cell In[47], line 1
----> 1 a = con.execute("""
      2 select dragon_embedding from vss_snippets where rowid==1
      3 """).fetchone()

OperationalError: Error reconstructing vector - Does the column factory string end with IDMap2? Full error: Error in void faiss::IndexIDMap2Template<IndexT>::reconstruct(faiss::idx_t, typename IndexT::component_t*) const [with IndexT = faiss::Index; faiss::idx_t = long int; typename IndexT::component_t = float] at /home/runner/work/sqlite-vss/sqlite-vss/vendor/faiss/faiss/IndexIDMap.cpp:236: key 1 not found

Thank you so much for your help 🙏

Ultimately, I would like to be able to query for closest vectors like so:

con.execute("""
select rowid, distance
from vss_snippets
where vss_search(
  embedding,
  ?
)
limit 100;
""", [embedding.astype(np.float32).tobytes()])
asg017 commented 10 months ago

After your insert into statement, try running con.commit(). The vss0 tables do some non-standard things with transactions, and requires a commit in order to save inserted data correctly.

That being said, the Does the column factory string end with IDMap2 suggestion is not necessary and confusing, I'll work to change that. You don't need the explicit factory="Flat,IDMap2" argument in your case, you can remove it to make it a bit easier to understand

radekosmulski commented 10 months ago

Thank you so much for your help! 🙂🙏 Really appreciate it!