duckdb / duckdb_vss

MIT License
94 stars 10 forks source link

Optimize lateral VSS joins using HNSW index #29

Closed Maxxen closed 1 month ago

Maxxen commented 2 months ago

This PR adds an optimizer rule to use the HNSW index to accelerate lateral vss joins, e.g.

CREATE TABLE a (a_vec FLOAT[3], a_id INT);
CREATE TABLE b (b_vec FLOAT[3], b_str VARCHAR);

SELECT * FROM a, LATERAL (
  SELECT * FROM b ORDER BY array_distance(a.a_vec, b.b_vec) LIMIT 3
);

This enables "batch" index lookups by matching against a column of vectors instead of passing a single constant array value.

The optimization will apply for any constant value of k (e.g. what gets passed to inner LIMIT clause) up to DuckDB's STANDARD_VECTOR_SIZE (= 2048 by default).