A User should know why documents are returned

As a user, I want to understand why certain documents are returned so that we can formulate the LLM context better.

One of the benefits of late interaction is having per-token scores that can help us explain why a certain document is returned. This also means that we can calculate a "highlight" span that is most similar to the query.

See https://blog.vespa.ai/announcing-colbert-embedder-in-vespa/ for Vespa's explainer.

Acceptance Criteria

Token scores are returned in the search results.
Token forms are returned in the search results.

Note: This requires us to know what token is stored in the index and hydrate it for the results. We could couple this with the model's vocabulary and store the vocab id or we can store the raw token.

DeployQL / LintDB

A User should know why documents are returned #8

Acceptance Criteria