Open vinibs opened 5 months ago
Hi @HenryHengZJ, thanks for your answer.
I'm not sure I get your point. Do you mean the distanceStrategy
attribute? If so, there are two doubts that just raised about it:
distance
but in the flow we specify the minimum similarity
instead?The issue was actually regarding the fact that we pass a minimum similarity
score to the block in the flow, but we compare it to the distance
instead (which, instead of "the bigger, the better" is "the smaller, the better"), making the "minimum score" input not having the expected behavior.
So, as I'm not very familiar with these distance strategies yet, do you think changing it would solve this situation or would it be better to actually change how the query is built to consider the similarity instead of the distance?
To add to this, calculating cosine similarity on the exact same vector does not give a score of 1, but close to a score of 0.5
Is there a follow up on this?
I would be interested to also be able to select the metric to fetch vectors from the pgvector database.
Describe the bug When using Postgres as the Vector Store for a flow and querying it with a specified minimum score, it's not returning anything. Searching the code, I noticed that the Postgres node calculates the distance between the vectors and returns them sorted ascending, but the VectorStore to Document node expects the number to be a similarity value, not the distance, resulting in it discarding the most relevant results for my query. I'm making a local change to try changing this value returned by Postgres to be "1 - distance", which seems to be enough to fix this situation. If it doesn't bring other side effects, I can also make a PR for this little change.
To Reproduce Steps to reproduce the behavior:
Expected behavior The output should bring the stored vectors ordered by the most similar to the question, but instead it brings them ordered by the less similar. If a minimum score is passed to the VectorStore to Document node, for the exact same question, no result is brought from the query.
Screenshots The results when querying without setting the minimum score (it brings all entries):
The results when querying with a minimum score of 80% (it doesn't bring any data):
The VectorStore documents' log with the calculated similarity for this case (bringing the exact same question with a score of
0
while bringing non-related questions with greater scores - actually, greater distances):Flow sql-test Chatflow.json
Setup
npx flowise start
,pnpm start
]: DockerAdditional context As mentioned before, it seems the Postgres node is calculating the
distance
instead of thesimilarity
when querying the vector store. I'm currently testing changing this calculation to bringing "1 - distance" as the similarity score (or changing it directly on the query calculation), but I'm not aware about possible side effects this could cause, since I'm working with Flowise for just 4 days and am not very familiar to its resources. I'd like to confirm this issue before opening a pull request to fix it.