criteo / autofaiss

Automatically create Faiss knn indices with the most optimal similarity search parameters.
https://criteo.github.io/autofaiss/
Apache License 2.0
802 stars 74 forks source link

Suspicious constant 1-recall score #121

Open OmniscienceAcademy opened 2 years ago

OmniscienceAcademy commented 2 years ago

I have trained 3 different index and every time, my 1-recall@20 are exactly the same:

INFO:autofaiss: 1-recall@20: 0.802
INFO:autofaiss: 1-recall@40: 0.824

But there is some variation in the 20-recall and 40-recall scores.

3 digits of exactitude is too much.

What do you think about it?

rom1504 commented 2 years ago

The kmeans training is not deterministic unless you fix the seed.

rom1504 commented 2 years ago

Ah i thought you were reporting scores not being exactly the same. It's expected they are almost the same yes.

OmniscienceAcademy commented 2 years ago

Ok, I've investigated, with a loop on:

set_search_hyperparameters(
    index, f"nprobe={nprobe},efSearch={2*nprobe},ht={ht}", use_gpu=False
)

in fact, my 1-recall@40 is asymptotically convergent toward 0.8 with nprobe. But weird thing is : the limit does not depend on the factory string.

I've tried OPQ128_896,IVF65536_HNSW32,PQ128x8 (13Go) OPQ256_1024,IVF65536_HNSW32,PQ256x8 (30Go) and OPQ768_768,IVF262144_HNSW32,PQ768x4fsr (48Go)

And I always have 1-recall@40 = 0.82 for reasonable value of nprobe