asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Missing data with sqlite WAL enabled? #71

Open zackangelo opened 1 year ago

zackangelo commented 1 year ago

Is there anything about the extension that would make it incompatible with the sqlite WAL? When I have WAL enabled, there seems to be data missing from my index.

I'll try to write a self-contained reproducer, but was curious if there was an obvious reason why this would be the case.

asg017 commented 1 year ago

Nothing that comes to mind! Though if you're on v0.1.1 try with v0.1.1-alpha.20, as a few insert/delete bugs were fixed since then. I'll release v0.1.2 shortly.

Though also of note, vss0 virtual tables hook into transactions when inserting/deleting data. It's possible that if you insert data in a transaction, you may not be able to "read" it until a COMMIT happens.

create virtual table vss_foo using vss0(a(2));

begin;

insert into vss_foo(rowid,a) values (1, '[1, 1]');
insert into vss_foo(rowid,a) values (2, '[2, 2]');

-- this query will return nothing, since the vectors haven't been inserted
-- into the faiss index yet since the transaction hasn't commited
select rowid, distance from vss_foo where vss_search(a, '[0, 0]');

commit;

-- now it works!
select rowid, distance from vss_foo where vss_search(a, '[0, 0]');

If it's not that, then it's possibly some WAL-specific bug that I can look into

zackangelo commented 1 year ago

Definitely seeing some strange behavior with WAL turned on.

I'll try to get around to writing a reproducer. Additionally when I perform a VACUUM it won't open the index at all when running vss_search (fails with unable to read at index 0).

zackangelo commented 1 year ago

Sorry, just realized I didn't respond to your other suggestions.

Concretely, the only thing I'm changing between tests is the journal mode. When the journal mode is DELETE, it works. When I change it to WAL, it's missing.

asg017 commented 1 year ago

the VACUUM issue is probably a bug - during a vacuum sqlite re-assigns rowids which will break internal parts of the vss0 table. I'll fix that

zackangelo commented 1 year ago

Would this bug also be triggered by dump and load as well?

I tried dumping a sqlite db to a file and then loading it in another and got the same error: (code: 1) Could not read index at position 0

asg017 commented 1 year ago

I believe c0ff505 will fix the VACUUM issue. As well the .dump issue, i believe that internally calls VACUUM and causes the same error.

I still haven't been able to repro WAL-specific issues, so if you have any pointer, let me know!

This is what I was trying: Creating a new db in WAL mode with a vss0 table, inserting/deleting data and constantly reconnecting, but still work as expected

.mode box
.header on

.open tmp.db
.load dist/debug/vector0
.load dist/debug/vss0

PRAGMA journal_mode=wal;
create virtual table vss_bug using vss0(a(2));

.open tmp.db
.load dist/debug/vector0
.load dist/debug/vss0

insert into vss_bug(rowid, a) values (1, '[1, 1]');
insert into vss_bug(rowid, a) values (2, '[2, 2]');
delete from vss_bug where rowid = 2;

.open tmp.db
.load dist/debug/vector0
.load dist/debug/vss0

insert into vss_bug(rowid, a) values (2, '[2, 2]');
insert into vss_bug(rowid, a) values (3, '[3, 3]');

.open tmp.db
.load dist/debug/vector0
.load dist/debug/vss0

select rowid, distance
from vss_bug
where vss_search(a, '[0, 0]')
limit 3;
zackangelo commented 1 year ago

ok great, i'll pull it down and check it out!