crate-workbench / langchain

⚡ Building applications with LLMs through composability ⚡
https://python.langchain.com
MIT License
0 stars 0 forks source link

UserWarning: Relevance scores must be between 0 and 1 #19

Open amotl opened 7 months ago

amotl commented 7 months ago

About

When running the test cases, there is a warning now. Most probably, it has been introduced by changing the style of the similarity search query through GH-15, which in turn changed the value range of the returned CrateDB-native _score values.

/path/to/langchain/libs/langchain/langchain/schema/vectorstore.py:313:
UserWarning: Relevance scores must be between 0 and 1, got 
[(Document(page_content='foo', metadata={'page': '0'}), 1.414213562373095), (Document(page_content='bar', metadata={'page': '1'}), 1.0606601717798212), (Document(page_content='baz', metadata={'page': '2'}), 0.8485281374238569)]

Evaluation

CrateDB's _score values are computed by CrateDB on behalf of different criteria of the input SQL query expression, execution plan, or actual execution. In this manner, they don't directly convey any useful information about the actual vector search similarity distance.

Suggestion

Use a corresponding function provided by CrateDB to compute the similarity distance independently of the CrateDB-native _score value.

/cc @ckurze, @seut, @matriv

amotl commented 2 months ago

This one might also be interesting, because it discusses potential similar [sic!] woes with other vector stores.

At least, it tells us that not every store is getting it right from the very beginning, wrt. what LangChain or other applications might want or expect.