LanternDB is a relational and vector database, packaged as a Postgres extension.
It provides a new index type for vector columns called hnsw
which speeds up ORDER BY
queries on the table.
LanternDB builds and uses usearch for its single-header state-of-the-art HNSW implementation.
To build and install LanternDB:
git clone --recursive https://github.com/lanterndata/lanterndb.git
cd lanterndb
mkdir build
cd build
cmake ..
make install
# optionally
# make test
To install on M1 macs, replace cmake ..
from the above with cmake -DUSEARCH_NO_MARCH_NATIVE=ON ..
to avoid building usearch with unsupported march=native
CREATE EXTENSION lanterndb;
CREATE TABLE small_world (
id varchar(3),
vector real[]
);
INSERT INTO small_world (id, vector) VALUES
('000', '{0,0,0}'),
('001', '{0,0,1}'),
('010', '{0,1,0}'),
('011', '{0,1,1}'),
('100', '{1,0,0}'),
('101', '{1,0,1}'),
('110', '{1,1,0}'),
('111', '{1,1,1}');
hnsw
index on the table.-- create index with default parameters
CREATE INDEX ON small_world USING hnsw (vector);
-- create index with custom parameters
-- CREATE INDEX ON small_world USING hnsw (vector) WITH (M=2, ef_construction=10, ef=4, dims=3);
SELECT id, ROUND(l2sq_dist(vector, array[0,0,0])::numeric, 2) as dist
FROM small_world
ORDER BY vector <-> array[0,0,0] LIMIT 5;
The M
, ef
, and efConstruction
parameters control the tradeoffs of the HNSW algorithm.
In general, lower M
and efConstruction
speed up index creation at the cost of recall.
Lower M
and ef
improve search speed and result in fewer shared buffer hits at the cost of recall.
Tuning these parameters will require experimentation for your specific use case. An upcoming LanternDB release will include an optional auto-tuning index.
LanternDB's hnsw
enables search latency similar to pgvector's ivfflat
and is faster than ivfflat
under certain construction parameters. LanternDB enables higher search throughput on the same hardware since the HNSW algorithm requires fewer distance comparisons than the IVF algorithm, leading to less CPU usage per search.
Currently, there is only one operator <->
available.
This operator is intended exclusively for use with index lookups, such as in cases like ORDER BY vector <-> array[0,0,0]
.
Consequently, attempting to execute the query SELECT array[0,0,0] <-> array[0,0,0]
will result in an error.
There are four defined operator classes that can be employed during index creation:
real[]
vector
real[]
integer[]
When creating an index, you have the option to specify the operator class to be used, like so:
CREATE INDEX ON small_world USING hnsw (vector dist_cos_ops);
This approach allows the <->
operator to automatically identify the appropriate distance function when utilized in index lookups.
INSERT
s into the created indexDELETE
s from the index and VACUUM
ingM
, ef
, efConstruction
) tuningINDEX-ONLY
scansINCLUDE
clauses in index creation, to expand the use of INDEX-ONLY
scansARRAY
s as vectors