If you are interested in the research line of nearest neighbor search or application and looking for a cooperation please feel free to contect me via fc731097343@gmail.com
======
SSG is a graph-based approximate nearest neighbor search (ANNS) algorithm. It provides a flexible and efficient solution for the metric-free large-scale ANNS on dense real vectors. It implements the algorithm of our paper: Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search
Data set | Download | dimension | nb base vectors | nb query vectors | original website |
---|---|---|---|---|---|
SIFT1M | original website | 128 | 1,000,000 | 10,000 | original website |
GIST1M | original website | 128 | 1,000,000 | 1,000 | original website |
Crawl | crawl.tar.gz (1.7GB) | 300 | 1,989,995 | 10,000 | original website |
GloVe-100 | glove-100.tar.gz (424MB) | 100 | 1,183,514 | 10,000 | original website |
Deep100M | deep100m.tar.gz* (34GB) | 96 | 100,000,000 | 10,000 | original website |
The performance was tested without parallelism. Among all the graph-based algorithms, NSG and SSG has the smallest index size.
Compared Algorithms:
Please see our NSG paper for the performance of other graph-based algorithms - FANNG.
$ cd /path/to/project
$ mkdir -p build && cd build
$ cmake .. && make -j
The main interfaces and classes have its respective test codes under directory tests/
.
Please follow the instructions below to build the SSG index.
Firstly, we need to prepare a kNN graph.
We suggest you use our efanna_graph to build this kNN graph. But you can also use any alternatives you like, such as KGraph or faiss.
For example:
$ cd /path/to/project/build/tests/
$ ./test_ssg_index data_path knn_graph_path L R Angle ssg_path
For example:
$ cd /path/to/project/build/tests/
$ ./test_ssg_optimized_search data_path query_path ssg_path search_L search_K result_path [random_seed]
NOTE: For now, we only provide interface for search for only one query at a time, and test the performance with single thread.
NOTE: Data alignment is essential for the correctness of our procedure, because we use SIMD instructions for acceleration of numerical computing such as AVX and SSE2. You should use it to ensure your data elements (feature) is aligned with 8 or 16 int or float. For example, if your features are of dimension 70, then it should be extend to dimension 72. And the last 2 dimension should be filled with 0 to ensure the correctness of the distance computing. And this is what data_align() does.
NOTE: Only data-type int32 and float32 are supported for now.
$ cd /path/to/project/
$ python setup.py install
NOTE: currently Python API only supports
search
method.
import numpy as np
import pyssg
data = np.fromfile("/path/to/sift_base.fvecs", dtype=np.float32)
dim = data[0].view(np.int32)
data = data.reshape(-1, dim + 1)
data = np.ascontiguousarray(data[:, 1:])
ndata, dim = data.shape
pyssg.set_seed(1234)
index = pyssg.IndexSSG(dim, ndata)
index.load("/path/to/ssg", data)
k, l = 100, 300
query = np.random.randn(dim).astype(np.float32)
knn = index.search(query, k, l)
print(knn)
Please refer to tests/test_python_query.py
for a real-world example.
We use the following parameters to get the SSG index in Fig. 6 of our paper.
We use efanna_graph to build the kNN graph
Dataset | K | L | iter | S | R |
---|---|---|---|---|---|
SIFT1M | 200 | 200 | 12 | 10 | 100 |
GIST1M | 400 | 400 | 12 | 15 | 100 |
Crawl | 400 | 420 | 12 | 15 | 100 |
GloVe-100 | 400 | 420 | 12 | 20 | 200 |
$ efanna_graph/tests/test_nndescent sift.fvecs sift_200nn.knng 200 200 12 10 100
$ efanna_graph/tests/test_nndescent gist.fvecs gist_400nn.knng 400 400 12 15 100
$ efanna_graph/tests/test_nndescent crawl.fvecs crawl_400nn.knng 400 420 12 15 100
$ efanna_graph/tests/test_nndescent glove-100.fvecs glove-100_400nn.knng 400 420 12 15 200
Dataset | L | R | Angle |
---|---|---|---|
SIFT1M | 100 | 50 | 60 |
GIST1M | 500 | 70 | 60 |
Crawl | 500 | 40 | 60 |
GloVe-100 | 500 | 50 | 60 |
$ ./test_ssg_index sift.fvecs sift_200nn.knng 100 50 60 sift.ssg
$ ./test_ssg_index gist.fvecs gist_400nn.knng 500 70 60 gist.ssg
$ ./test_ssg_index crawl.fvecs crawl_400nn.knng 500 40 60 crawl.ssg
$ ./test_ssg_index glove-100.fvecs glove-100_400nn.knng 500 50 60 glove-100.ssg
search_L
: range from search_K
to 2000random_seed
: 161803398Here we provide our pre-built kNN graphs and SSG index files used in our papar's experiments.
Dataset | kNN Graph | SSG Index |
---|---|---|
SIFT1M | sift_200nn.knng | sift.ssg |
GIST1M | gist_400nn.knng | gist.ssg |
Crawl | crawl_400nn.knng | crawl.ssg |
GloVe-100 | glove_400nn.knng | glove.ssg |