DeployQL / LintDB

Vector Database with support for late interaction and token level embeddings.
https://www.lintdb.com/
Apache License 2.0
54 stars 2 forks source link

Add Collections. Create v0.3.0 #23

Closed mtbarta closed 6 months ago

mtbarta commented 6 months ago

Collections enable users to add and search for text.

After creating an index, a collection will encode and store arbitrary key values. When searching, we return the key value data.

This simplifies setup for building on top of LintDB.

Here's an example of the API:

index_one = lintdb.IndexIVF(dir_path, 32, 128, 2, 4, 16, lintdb.IndexEncoding_BINARIZER)

collection_options = lintdb.CollectionOptions()
collection_options.model_file = "assets/model.onnx"
collection_options.tokenizer_file = "assets/colbert_tokenizer.json"
collection = lintdb.Collection(index_one, collection_options)

collection.train(['hello world!'] * 1500)

collection.add(0, 1, "hello world!", {"key": "metadata"})

opts = lintdb.SearchOptions()
opts.n_probe = 250
results = collection.search(0, "hello world!", 10, opts)

closes #18 closes #21