asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

`SqliteError: SQL logic error` when using with `better-sqlite3` under node #76

Open jdb8 opened 11 months ago

jdb8 commented 11 months ago

Hi there! I'm trying to get something simple running using better-sqlite3 by following the instructions here: https://alexgarcia.xyz/sqlite-vss/nodejs.html

My code looks as follows:

import Database from 'better-sqlite3';
import * as sqlite_vss from 'sqlite-vss';

const db = new Database(':memory:');
sqlite_vss.load(db);

const version = db.prepare('select vss_version()').pluck().get();
console.log(version);

db.exec('create virtual table vss_demo using vss0(a(3));');

const stmt = db.prepare('INSERT INTO vss_demo VALUES (?)');
const selectStmt = db.prepare('SELECT * from vss_demo;');

const embedding1 = [0.1, 0.2, 0.3];
const embedding2 = [0.2, 0.3, 0.4];

stmt.run(JSON.stringify(embedding1));
stmt.run(JSON.stringify(embedding2));

console.log(selectStmt.all());

The select vss_version() call works, printing v0.1.1-alpha.20, which tells me that the extension is successfully installed and that better-sqlite3 is working.

However, when trying to insert two embeddings, on the second I get the following:

❯ node index.mjs
v0.1.1-alpha.20

file:///Users/jbateson/code/sqlite-vss-testing/index.mjs:19
stmt.run(JSON.stringify(embedding2));
     ^
SqliteError: SQL logic error
    at file:///Users/jbateson/code/sqlite-vss-testing/index.mjs:19:6
    at ModuleJob.run (node:internal/modules/esm/module_job:198:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:385:24)
    at async loadESM (node:internal/process/esm_loader:88:5)
    at async handleMainPromise (node:internal/modules/run_main:61:12) {
  code: 'SQLITE_ERROR'
}

If I remove the second insertion call, and insert only one entry, I get no error but the select query seems to return nothing (which implies that the first insertion didn't work):

❯ node index.mjs
v0.1.1-alpha.20
[ { a: null } ]

I'm not sure if this is me doing something wrong - unfortunately the node docs only contain examples for insertions and not table creation or querying - but I've tried to also read https://alexgarcia.xyz/sqlite-vss/getting-started.html#introduction too and those examples give me a similar inscrutible "SQL logic error". I also can't spot any node testcases in this repo from which to copy from.

I've created a repro of this at https://github.com/jdb8/sqlite-vss-testing if that's helpful. Please let me know if you're unable to repro or have more questions!

asg017 commented 11 months ago

Thank you for the detailed report @jdb8 !

Can you update the INSERTs to also include a unique rowid, like this:

const stmt = db.prepare('INSERT INTO vss_demo(rowid, a) VALUES (?, ?)');
const selectStmt = db.prepare('SELECT * from vss_demo;');

const embedding1 = [0.1, 0.2, 0.3];
const embedding2 = [0.2, 0.3, 0.4];

stmt.run(1, JSON.stringify(embedding1));
stmt.run(2, JSON.stringify(embedding2));

If that works, then I think that indicates a bug within sqlite-vss. When you insert into a vss0 table without providing a rowid, it should come up with a new valid rowid but that part may not be working correctly

jdb8 commented 11 months ago

Thanks for the quick response!

Just tried that out, and it does seem to fix the problem! I'm no longer getting the SQL logic error when I manually specify a rowid. Funnily enough, I had tried something like this before, but I think I incorrectly tried to update the create virtual table call to pass in the rowid there rather than treating the virtual table as having an implicit generated rowid column.

So that's great. In terms of the output though, now that I'm re-reading https://alexgarcia.xyz/sqlite-vss/getting-started.html#introduction, is it actually expected that select * would not return the JSON embeddings themselves? I think I was assuming that { a: null } implied the rows hadn't been inserted, but now that I make a real query with a where-clause, I'm at least able to extract distance:

const selectStmt = db.prepare('SELECT rowid, distance from vss_demo where vss_search(a, ?) limit 5;');
console.log(selectStmt.all(JSON.stringify([0.1, 0.2, 0.4])));

->

❯ node index.mjs
v0.1.1-alpha.20
[
  { rowid: 1, distance: 0.009999998845160007 },
  { rowid: 2, distance: 0.020000003278255463 }
]

Is there any way to return the original embeddings for each of those rows? Even though in reality I'd likely just use the rowid to select from a non-virtual table anyway, but curious if this is possible for debugging (I'm a bit unfamiliar with how virtual tables work under the hood, if you couldn't tell 😅)

asg017 commented 11 months ago

I'm still working through the getting started page (and all of the alexgarcia.xyz/sqlite-vss site), so apologies if you find issues there!

I'm pushing 0.1.1-alpha.21 right now, which will allow you to reference vss0 column directly and get the raw embedding value, in the "raw bytes" format. You can then decode that directly in Node with Float32Array use vector_to_json() to convert it to JSON.

Though keep in mind, Faiss might encode/decode your vectors and cause it to slightly change, especially if you use a custom factory string. So something like 0.09431782043791359 might return as 0.09431782958475, but you likely notice much of a difference

asg017 commented 11 months ago

ok 0.1.1-alpha.21 is out, including an npm package, please try it out and lmk if it works for you!

jdb8 commented 11 months ago

@asg017 awesome, works great! Thanks for releasing this so quickly.

Might be a silly question, but when I try to use Float32Array to convert the returned embedding, I get quite different results - e.g. for the input 3-dimensional vector of [0.1, 0.2, 0.3], I seem to be getting back something like:

Float32Array(12) [
    205, 204, 204,  61, 205,
    204,  76,  62, 154, 153,
    153,  62
  ]

This is via

const results = selectStmt.all();
for (const { embedding } of results) {
    const floatArray = new Float32Array(embedding);
    console.log({ floatArray });
}

I also tried setting the last argument of Float32Array to 3 (as the length?) but that gave me the same result. I'm definitely doing something wrong here, but not sure what :(

Also, I'm not sure what the syntax for using vector_to_json() would be - I tried a naive select vector_to_json(*) from vss_demo but that gave me SqliteError: wrong number of arguments to function vector_to_json().

EDIT: ah, right after posting this, I realised I was missing .buffer on the embedding value - works now!

asg017 commented 11 months ago

It would be select vector_to_json(a) from vss_demo, which would return the JSON array of the a embedding, as a string so you'll have to JSON.parse it