asg017 / sqlite-vec

A vector search SQLite extension that runs anywhere!
Apache License 2.0
4.24k stars 134 forks source link

Upsert into vec0 tables? #127

Open gospodinbodurov opened 1 month ago

gospodinbodurov commented 1 month ago

I tried the following query:

INSERT INTO vec_items(rowid, embedding) VALUES (?, ?) ON CONFLICT(rowid) DO UPDATE SET embedding = excluded.embedding;

and got:

UPSERT not implemented for virtual table "vec_items"

Is upsert supported in some form?

asg017 commented 1 month ago

@gospodinbodurov thanks for the report! Initially I though UPSERT wasn't support for virtual tables, as mentioned in the SQLite docs:

UPSERT does not currently work for virtual tables.

But now I think that's outdated. Other docs mention a few other APIs that may help:

So I'll try to include proper UPSERT support in an upcoming release!

gospodinbodurov commented 1 month ago

You are welcome.

P.S. Not sure whether this is the right place for asking, but where I can find the different supported distance metrics? I guess at the moment cosine and l2 are the only ones?

asg017 commented 1 month ago

cosine, l1, and l2. is there any other you'd be interested in?

gospodinbodurov commented 1 month ago

At the moment I think the existing ones are good enough, but I am wondering how to use them - could not find their usage in the documentation.

P.S. I plan integrating this project in our platform and that's the reason I am asking about the upsert functionality and what distance functions are supported. If upsert will not be supported I will rebuild the whole database from scratch during updates.

asg017 commented 1 month ago

docs are lacking but for vec0 tables you can specify a 'distance_metric=cosine' option after the vector column declaration, hoping to update the docs early next week with samples

i'll also try to get upsert support soon! dont think it'll take too long but i won't be at a computer for a few more days

z3ugma commented 3 days ago

@asg017 I'd also like to do query / searches based on Hamming distance of a binary quantized vector. Something like distance metric = hamming

asg017 commented 3 days ago

@z3ugma if you use bit[N] for your vector column type, then the MATCH operator will automatically use hamming distance for you:

CREATE VIRTUAL TABLE vec_sample using vec0(
  bit_embedding bit[8]
);

insert into vec_sample(rowid, bit_embedding) values
  (1, vec_bit(X'FF')),
  (2, vec_bit(X'00')),
  (3, vec_bit(X'0F'));

select
  rowid,
  vec_to_json(bit_embedding),
  distance
from vec_sample
where bit_embedding match vec_bit(X'FF') and k = 3;

/*
┌───────┬────────────────────────────┬──────────┐
│ rowid │ vec_to_json(bit_embedding) │ distance │
├───────┼────────────────────────────┼──────────┤
│ 1     │ '[1,1,1,1,1,1,1,1]'        │ 0.0      │
│ 3     │ '[1,1,1,1,0,0,0,0]'        │ 4.0      │
│ 2     │ '[0,0,0,0,0,0,0,0]'        │ 8.0      │
└───────┴────────────────────────────┴──────────┘
*/

This was done because cosine/l2/l1 distance didn't make sense for binary vectors, and hamming distance was the only distance metric that makes sense for binary vectors.

Happy to support a distance_metric= option for binary vectors if there are use-cases for other bit vector distance metrics!

z3ugma commented 3 days ago

This is great! Will it return simultaneous responses for an array of input query vectors?

Or does the query need to go one input embedding at a time