//.........
int main() {
int d = 64; // dimension
int nb = 100000; // database size
int nq = 10000; // nb of queries
std::mt19937 rng;
std::uniform_real_distribution<> distrib;
float* xb = new float[d * nb];
float* xq = new float[d * nq];
// .......
int nlist = 100;
int k = 4;
int m = 8; // bytes per vector
faiss::IndexFlatIP quantizer(d); // the other index
faiss::IndexIVFPQ index(&quantizer, d, nlist, m, 8, METRIC_INNER_PRODUCT);
index.train(nb, xb);
index.add(nb, xb);
{ // sanity check
idx_t* I = new idx_t[k * 5];
float* D = new float[k * 5];
index.search(5, xb, k, D, I);
//........
delete[] I;
delete[] D;
}
delete[] xb;
delete[] xq;
return 0;
}
Platform
Operating System: Ubuntu 20.04.3 LTS
Kernel: Linux 5.4.0-122-generic
Architecture: x86-64
Running on:
CPU
Interface:
C++
Reproduction instructions
After reading the Feiss code, I found that IndexIVFPQ used residual calculation during both training and data addition processes. The residual data is used to calculate fine-grained centroids, and the fine-grained centroid data is also stored as residual data.
However, during the query process, the query vector x was not subjected to residual processing and compared with the fine-grained centroid to calculate the distance. Is this correct?
In my understanding, the query quantity x should also be calculated based on the residual vector x ', and then use x' and fine-grained centroid comparison techniques based on distance to make sense.
Comparing the fine-grained centroids formed by the original vector x and residual data makes me a bit confused.
Summary
Platform
Operating System: Ubuntu 20.04.3 LTS Kernel: Linux 5.4.0-122-generic Architecture: x86-64
Running on:
Interface:
Reproduction instructions
After reading the Feiss code, I found that IndexIVFPQ used residual calculation during both training and data addition processes. The residual data is used to calculate fine-grained centroids, and the fine-grained centroid data is also stored as residual data.
However, during the query process, the query vector x was not subjected to residual processing and compared with the fine-grained centroid to calculate the distance. Is this correct?
In my understanding, the query quantity x should also be calculated based on the residual vector x ', and then use x' and fine-grained centroid comparison techniques based on distance to make sense.
Comparing the fine-grained centroids formed by the original vector x and residual data makes me a bit confused.
I hope you can help me answer it. Thank you~