asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 58 forks source link

'Unable to open database file' when cloning working db to memory in better-sqlite3 using .serialize() #114

Closed lakmeer closed 5 months ago

lakmeer commented 5 months ago

better-sqlite3 supports cloning a loaded db into memory by passing a serialized buffer into a new Database constructor. I would like to use this to have a test db that starts with some data and it's vector embeddings, but doesn't save any changes during testing (mainly to save me paying OpenAI to re-embed all my data each time I relaunch).

When using my test.db (which is otherwise working great), I get this error, but using a fresh database with single basic table, it works. Perhaps there is some compatibility issue with .serialize()?

Repro Script

import Database from 'better-sqlite3'

const USE_BUFFER = true

let db = new Database('./test.db')

if (USE_BUFFER) {
  const buffer = db.serialize()
  db.close()
  db = new Database(buffer)
}

console.log(db.prepare('select * from items').get())

Outcome

Further context

Schema for creating test.db

create table if not exists items (
  id integer primary key autoincrement,
  last_update timestamp default (unixepoch('now')),
  hash text not null,
  type text not null check(type in ('text', 'image', 'audio', 'link')),
  desc text,
  content text,
  data blob
);

create table if not exists tags (
  id integer primary key autoincrement,
  name text unique not null
);

create table if not exists item_tags (
  item_id integer not null,
  tag_id integer not null,
  foreign key (item_id) references items(id),
  foreign key (tag_id) references tags(id),
  primary key (item_id, tag_id)
);

create virtual table if not exists vss_items using vss0(
  embedding(1536)
);

create table if not exists embedding_cache (
  id integer primary key,
  hash text not null unique,
  embedding text not null
);

Code that created test.db

import Database from 'better-sqlite3'
import * as VSS from 'sqlite-vss'

const db = new Database('./test.db')
db.pragma('journal_mode = WAL')
VSS.load(db)
db.exec( <the above schema> )

Thanks for any advice you might have. This extension is very cool and has been making my project much more fun to work on :)

asg017 commented 5 months ago

Hey @lakmeer , thanks for the detailed report!

I think the issue lies within using WAL mode in the deserialized database. From the Deserialize a database docs:

The deserialized database should not be in WAL mode. If the database is in WAL mode, then any attempt to use the database file will result in an SQLITE_CANTOPEN error. The application can set the file format version numbers (bytes 18 and 19) of the input database P to 0x01 prior to invoking sqlite3_deserialize(D,S,P,N,M,F) to force the database file into rollback mode and work around this limitation.

That SQLITE_CANTOPEN error code has the corresponding 'Unable to open database file' error message.

I think you can run db.pragma('journal_mode = DELETE') before serializing the database, and it should work fine! Or maybe even:

buffer[18] = 0x01;
buffer[19] = 0x01;

But that seems sketchier to me. Let me know if that works!

lakmeer commented 5 months ago

Hi @asg017,

Perfect, you nailed it 😁

Thanks so much for you help, and thanks for this extension. Stay awesome