facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.13k stars 3.62k forks source link

Is the distance returned using L2? #413

Closed hminle closed 6 years ago

hminle commented 6 years ago

Summary

Hello guys, when the index return distance, is that the L2 distance? Because I also build a KD-tree with scikit learn, it returns L2 distances. But the two distance is totally different.

>>> indexflatL2.search(emb2.astype('float32'), k=10)
(array([[1.0788677, 1.1134999, 1.1210456, 1.1379397, 1.1737218, 1.1818781,
        1.2293975, 1.2542101, 1.3202076, 1.4002445]], dtype=float32), array([[6, 5, 3, 2, 8, 7, 1, 0, 9, 4]]))
>>> tree.query(emb2.astype('float32'), k=10)
(array([[1.03868553, 1.05522505, 1.05879439, 1.06674252, 1.08338444,
        1.08714219, 1.10878202, 1.11991525, 1.14900282, 1.18331924]]), array([[6, 5, 3, 2, 8, 7, 1, 0, 9, 4]]))

Running on :

Reproduction instructions

Use sample database of (10, 128), search vector (1, 128) to test distance between Kd-tree (scikit-learn) and faiss

beauby commented 6 years ago

Here it seems it is returning the square of the L2 distance. cc @mdouze

beauby commented 6 years ago

So in general, our L2 indexes return the square (L2) distance.

hminle commented 6 years ago

@beauby but I calculate the distance directly, and the results is the same with kd-tree (scikit), I don't know why faiss generates the different distance?

beauby commented 6 years ago

@hminle It is just squared. If you take the square root of the distances returned by Faiss, you will get the same value as with the kd-tree.