Open fdurant opened 1 month ago
There is no scoring when adding documents. It only happens when retrieving.
Chunk scores are part of ColbertRetriever::text_search
. Does it answer your need ?
Also I don't think we keep track of the tokens. Only their embeddings. And the Chunk score is the max of the embeddings scores which are not exposed either, @zzzming can you confirm ?
@cbornet you are correct. We don't store the tokens for the text... only the embeddings of the tokens.
I'm experimenting with RAGStack ColBERT and have a feature request.
In order to be able to produce a query-passage scoring interpretability visualization like this, it would be handy if the result of
ColbertVectorStore.add_texts
also included the top-n list of most contributing tokens, each with a normalized score that would be trivial to color-code in a UI. This could be achieved via an extra parameterinclude_token_scores: int = 0