Closed yushaw closed 4 months ago
Hey @yushaw! Yes it's possible to use FTS5 tables alongside sqlite-vec
, I just haven't documented it (besides this tweet).
The strategy, based on this supbase article for postgres at a high level: You'll have one fts5
virtual table and one vec0
virtual table, that share the same rowid so you can do JOINs. Insert text you want to search in the fts5
table, and also store embeddings of that text in the vec0
table.
Then when you want to perform a search on both full text and vector search, you'll use "Reciprocal Ranked Fusion. " Basically you do both queries in separate CTE steps, then 'combine' the results, ranking results that are in both higher.
Here's a full example, again mostly from that supabase article. It uses sqlite-lembed
to generate embeddings, but you can just bind your vectors as JSON/BLOBs instead of using lembed.
.bail on
.load ./dist/lembed0
.load ../sqlite-vec/dist/vec0
.echo on
select sqlite_version();
INSERT INTO temp.lembed_models(key, model)
select 'all-MiniLM-L6-v2', lembed_model_from_file('models/all-MiniLM-L6-v2-44eb4044.gguf');
create table items as select value as sentence from json_each('[
"The dog is barking",
"The cat is purring",
"The bear is growling",
"The computer is buzzing",
"halloween weekend was a success in long beach"
]');
create virtual table fts_items using fts5(sentence);
insert into fts_items(rowid, sentence)
select rowid, sentence from items;
create virtual table vec_items using vec0(sentence_embedding float[384]);
insert into vec_items(rowid, sentence_embedding)
select rowid, lembed('all-MiniLM-L6-v2', sentence) from items;
.mode box
.header on
.param set :query 'california mascot'
.param set :weight_fts 1.0
.param set :weight_vec 1.0
.param set :rrf_k 60
with vec_matches as (
select
rowid,
row_number() over (order by distance) as rank_number,
distance
from vec_items
where sentence_embedding match lembed('all-MiniLM-L6-v2', :query)
and k = 3
order by distance
),
fts_matches as (
select
rowid,
row_number() over (order by rank) as rank_number,
rank as score
from fts_items
where sentence match :query
),
final as (
select
items.rowid,
items.sentence,
vec_matches.distance as vector_distance,
fts_matches.score as fts_score,
coalesce(1.0 / (:rrf_k + fts_matches.rowid), 0.0) * :weight_fts +
coalesce(1.0 / (:rrf_k + vec_matches.rowid), 0.0) * :weight_vec
as combined_score
from fts_matches
full outer join vec_matches on vec_matches .rowid = fts_matches.rowid
join items on coalesce(fts_matches.rowid, vec_matches.rowid) = items.rowid
order by combined_score
)
select * from final;
Again I'll document this soon, lmk if you need more samples
Thank you for the detailed response and example! It's great to know that they can be easily used together. I'll start implementing this strategy. Looking forward to the upcoming documentation.
Thanks again for your clearance!
Hello,
I'm currently using FTS5 for full-text search in my application and am interested in adding vector-based search capabilities using sqlite-vec. I add an external table as this part describe. https://www.sqlite.org/fts5.html#external_content_and_contentless_tables
I have a few questions regarding compatibility and integration:
Compatibility: Can sqlite-vec and FTS5 be used together within the same database? Specifically, how can I manage separate indexing for text and vector data, and are there any considerations I should be aware of?
Integration: What is the recommended method for linking sqlite-vec vector tables with FTS5 indexed data? Should I use identifiers like rowid for cross-referencing?
Many thanks