Closed botchagalupeai closed 2 months ago
I think I figured this out. However, from a documentation perspective it can be confusing. From my notes:
Vectors are Already Normalized
This formula shows that the Euclidean distance is directly related to the cosine similarity when the vectors are normalized. If the vectors are very similar (i.e., the angle between them is small), both metrics will produce very similar results.
Small Magnitudes of Difference When the vectors are extremely close to each other, the differences in the metrics may be very small, leading to nearly identical results. This is particularly true if the vectors have small differences in their components, resulting in both L2 distance and cosine similarity producing similar outcomes.
Unless someone has a better answer I'm sticking with this answer.
What happened?
I create a collection with the proper metadata for creating Cosine distances and when I do a query my output is always in L2 distances. I have tried both L2 and Cosine and the results are also the same. I also recreated the input data and the collection each time.
Here's the query.
I've also discussed this issue on Discord before I filled this issue.
discord
Versions
Python 3.10.12 Chroma 0.5.0 Same results in bot Conda 24.7.1 and Google Colab (Pro) ProductName: macOS ProductVersion: 14.5 BuildVersion: 23F79
Relevant log output