LintDB is a multi-vector database meant for Gen AI. LintDB natively supports late interaction like colBERT and PLAID.
LintDB relies on OpenBLAS for accerlated matrix multiplication. To smooth the process of installation, we only support conda.
conda install lintdb -c deployql -c conda-forge
LintDB makes it easy to upload data, even if you have multiple tenants.
tenant_id = 1
index = ldb.IndexIVF(index_path)
collection_options = lintdb.CollectionOptions()
collection_options.model_file = "model.onnx"
collection_options.tokenizer_file = "colbert_tokenizer.json"
collection = lintdb.Collection(index, collection_options)
...
# we use an IVF index, so we need to train the centroids.
index.train(training_data)
...
# add documents to the collection.
collection.add(tenant_id, [{'id': 1, 'text': 'hello world', 'metadata': {'doc_id': 'abc123'}}])
opts = ldb.SearchOptions()
opts.k_top_centroids = 2 # number of centroids to search per query token.
results = collection.search(
tenant_id,
embeddings,
100, # k to return
opts
)
LintDB aims to support late interaction and more advanced retrieval models.
LintDB aims to be a full retrieval platform.
We want to extend LintDB's features to include:
LintDB is one of two databases that support token level embeddings. The other being Vespa.
Vespa is a robust, mature search engine with many features. However, the learning curve to get started and operate Vespa is high.
With embedded LintDB, there's no setup required. conda install lintdb -c deployql
and get started.
Chroma is an embedded vector database available in Python and Javascript. LintDB currently only supports Python.
However, unlike Chroma, LintDB offers multi-tenancy support.
For detailed documentation on using LintDB, refer to the official documentation
LintDB is licensed under the Apache 2.0 License. See the LICENSE file for details.
We need your help! If you'd want a managed LintDB, reach out and let us know.
Book time on the founder's calendar: https://calendar.app.google/fsymSzTVT8sip9XX6