asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 58 forks source link

`OperationalError: Error saving index (1): string or blob too big` when index size exceeds 650 000 records #100

Closed radekosmulski closed 10 months ago

radekosmulski commented 10 months ago

I am getting an error when I attempt to add more than 650 000 entries to the index. I created a reproducer as follows.

Screenshot 2023-09-04 at 11 10 50 am Screenshot 2023-09-04 at 11 15 22 am

Here is the code:

con = sqlite3.connect("vss_faiss.db")

con.enable_load_extension(True)
sqlite_vss.load(con)
con.enable_load_extension(False)

version, = con.execute('select vss_version()').fetchone()
print(version)

con.execute("""
create virtual table vss_MiniLm_L6_v2 using vss0(
  embedding(384) factory="Flat,IDMap2" metrict_type=INNER_PRODUCT
);
""")

j = 0
while True:
    embs = np.random.randn(500, 384)
    embs_bytes = []
    for e in embs:
        embs_bytes.append(e.astype(np.float32).tobytes())
    for i, eb in enumerate(embs_bytes):
        con.execute("insert into vss_MiniLm_L6_v2(rowid, embedding) values (?, ?)", [i+(j*500)+1, eb])
    j += 1
    if j and j % 20 == 0:
        con.commit()
        print(j)

Any help would be greatly appreciated 🙏 Even if the answer would be that larger indexes are currently not supported, that is okay, would be great to know though.

Thank you so much for your help!

asg017 commented 10 months ago

There's a 1GB limit to vss0 columns now - more info in #1.

There are some tricks you can use to skirt around this (specifically with different Faiss factory strings and pre-training), described a bit in that issue.

But ya, won't be a full fix until that 1GB limit is removed. Hopefully soon!

radekosmulski commented 10 months ago

Thanks for the answer @asg017! Appreciate it!