Closed Jae0Kang closed 4 days ago
It could be some quirk with the data distribution or the implementation. Which vector(s) did you use as queries? The vectors themselves?
I tried to repro the issue, see issue_3730.ipynb
I did not see anything suspicious. Could you precisely state which vector you expect to find given which query vector?
Yes. There are 10,000 test cases in qdrant's second benchmark which has data of h&m. I ran knn search with k = 300, and I expected vectors whose indexes are
@Jae0Kang Can you provide a way to completely reproduce the issue you are seeing? For example, @mdouze asked for specific the query vector as part of your search. We can't investigate the issue further until we have a way to get the same results you are seeing.
@asadoughi this is the code that I tested and the result text file. As you can see in the text file, it is showing there are 5156 data which is satisfying the filtering condition. However, the accuracy (num. of intersection of indexflatl2 and hnsw knn / num. of indexflatl2) of post filtering is 0.1. I would really appreciate if you can take a look of it.. faiss_postfilter_code.txt faiss_postfilter.txt
This issue is stale because it has been open for 7 days with no activity.
Your code has references to data files that were not shared. We cannot reproduce the issue without a complete example.
Catching up on this Thanks for posting the C++ reproducing the issue, but it is too complex (you provided the data before). I still don't understand which is/are the query vector(s) that you are using? This would make it possible to reproduce in python.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Summary
While testing post-filtering that I implemented with Faiss in C++, I encountered an issue where the KNN search does not return the expected neighbors connected in the graph, even though k and efSearch values were increased.
Dataset: Used Qdrant's ANN filtering benchmark dataset (H&M clothing data) from this repository. The dataset size is 105,100. Dimension is 2048.
Steps of post-filtering:
Test Case: Parameter I used:
Test result:
Comparison with Answers: Compared KNN results with answers created by using indexflatl2. Test answers (format: (index, distance)):
Additional Checks: Verified if (92552, 178.385) is connected to any node in the KNN search result of using index.hnsw.neighbors[]. The KNN search returns neighbors of (92552, 178.385) with longer distances but does not return (92552, 178.385) itself.
Question: My understanding is that the KNN search should return the k neighbors with the shortest distances to the query. Is this an expected result, or is there a potential error in the search implementation?
Platform
OS: Linux
Faiss version:
Installed from:
Faiss compilation options:
Running on:
Interface:
Reproduction instructions