facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.39k stars 3.55k forks source link

hsnw range search is not working with 1.8.0 #3684

Closed ganeshpanaskar closed 3 weeks ago

ganeshpanaskar commented 1 month ago

Summary

faiss 1.80 release supports range query for hsnw algo is not working as expected. Threshold is getting ignored during search operation.

Platform

OS: 5.10.217-205.860.amzn2.x86_64 x86_64 GNU/Linux Faiss version: 1.8.0

Installed from:

!conda install -c pytorch -y faiss-cpu=1.8.0

Faiss compilation options:

Running on:

Interface:

Reproduction instructions

import faiss import numpy as np # Define the dimension of the vectors d = 256 # Example dimension nb = 10 # Number of database vectors nq = 2 # Number of query vectors # Generate random database and query vectors np.random.seed(1234) # For reproducibility xb = np.random.random((nb, d)).astype('float32') xq = np.random.random((nq, d)).astype('float32') faiss.normalize_L2(xb) # Normalize both query and database vectors faiss.normalize_L2(xq) hnsw_index_ip = faiss.IndexHNSWFlat(256, 16, faiss.METRIC_INNER_PRODUCT) hnsw_index_ip.hnsw.efConstruction = 512 hnsw_index_ip.hnsw.efSearch = 512 hnsw_index_ip.add(xb) radius = 0.74 # Cosine similarity threshold lims, D, I = hnsw_index_ip.range_search(xq, radius) # Process results for Example 1 for i in range(len(xq)): start = lims[i] end = lims[i+1] print(f"Query {i} has {end - start} neighbors within cosine similarity {radius}.") print("Distances (cosine similarities):", D[start:end]) print("Indices:", I[start:end]) **Results** Query 0 has 10 neighbors within cosine similarity 0.74. Distances (cosine similarities): [0.71962094 0.75443786 0.75163096 0.7508825 0.7652079 0.7373253 0.7335136 0.7609381 0.73516536 0.7450698 ] Indices: [9 7 8 3 1 6 2 4 5 0] Query 1 has 10 neighbors within cosine similarity 0.74. Distances (cosine similarities): [0.74524117 0.72365034 0.76518726 0.7913578 0.74097526 0.7454239 0.75055194 0.7481433 0.711897 0.7772374 ] Indices: [9 7 8 3 1 6 2 4 5 0]
mdouze commented 1 month ago

I can repro, it seems to work with L2 so there's something wrong with IP search.

mdouze commented 1 month ago

It also fails for top-k search, ie. resutls are ordered in increasing order rather than decreasing. The culprit seems to be:

https://github.com/facebookresearch/faiss/blob/4cfa63865f7fbf4836ff9b97a6d9495015bc39ff/faiss/IndexHNSW.cpp#L323

Before, there was a wrapper around the distance computer that negated the distances so that the HSNW object would not need to implement simiarities (rather than distances).

Now, this is all included in the selector object, so we shoudl instanciate it for several templates (distances and similarities)

mdouze commented 1 month ago

Should be fixed now.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been inactive for 7 days since being marked as stale.