DeployQL / LintDB

Vector Database with support for late interaction and token level embeddings.
https://www.lintdb.com/
Apache License 2.0
50 stars 2 forks source link

Add interpret method and batching in collections #26

Closed mtbarta closed 4 months ago

mtbarta commented 4 months ago

This PR adds a few things:

  1. We surface token scores in SearchResults.
  2. We add an interpret method on collections to use our internal tokenizer.
  3. We add an add_batch method to collections and handle padding internally.

There was a bugfix on the tokenizer to add special tokens.