asg017 / sqlite-vss

A SQLite extension for efficient vector search, based on Faiss!
MIT License
1.59k stars 59 forks source link

Support `rowid in (...)` constraints in `vss_search()` KNN queries #19

Open asg017 opened 1 year ago

asg017 commented 1 year ago

In KNN style searches, we should support rowid in (...) constraints in queries like so:

select rowid, distance
from vss_articles
where vss_search(description_embeddings, :query_vector)
  and rowid in (1, 2, 3, ..., 100)
limit 25

Currently we ignore the "equals" constraint on rowid, but if we were to capture that constraint (and enable sqlite3_vtab_in), we could read in all the rowids and use IDSelector to pre-filter results.

This would be especially great when paired with subqueries:

with subset as (
  select rowid
  from articles
  where published_at between '2022-01-01' and '2023-01-01'
    and newsroom = 'NY Times'
)
select rowid, distance
from vss_articles
where vss_search(description_embeddings, :query_vector)
  and rowid in (select rowid in subset)
limit 25

This would enable "pre-filtering" according to this post. This would be an easy-to-implement but probably-slow solution to push-down filters described in #2.

asg017 commented 1 year ago

Use IDSelectorBatch.

need to figure out idxStr/idxNum rules

teowave commented 11 months ago

I suppose then we can do pre-filtering with standard SQL and then feed the resulting rowids into the vss query. Nice.

Definitely nicer than getting 1000 "wide net" results from the vss query and then filtering.

That being said, in a document searching app I am working on I do 50 top searches and then filter, seems to work, albeit we never have the certainty that we are not missing something important.

teowave commented 11 months ago

Use IDSelectorBatch.

need to figure out idxStr/idxNum rules

I didn´t get the meaning of this one - can you please expand for us noobies?

asg017 commented 11 months ago

Those are mostly personal notes about how to implement this feature. IDSelectorBatch is Faiss tool that'll make it easier to search a large subset of vectors, and idxStr/idxNum refer to some internal changes I need to make to the vss0 module in order to make this compatible with older code.

I'll probably work on this next after the new v0.1.1 releases this week!

sutyum commented 3 months ago

Any update on this? @asg017