Open OmniscienceAcademy opened 2 years ago
The kmeans training is not deterministic unless you fix the seed.
Ah i thought you were reporting scores not being exactly the same. It's expected they are almost the same yes.
Ok, I've investigated, with a loop on:
set_search_hyperparameters(
index, f"nprobe={nprobe},efSearch={2*nprobe},ht={ht}", use_gpu=False
)
in fact, my 1-recall@40 is asymptotically convergent toward 0.8 with nprobe. But weird thing is : the limit does not depend on the factory string.
I've tried OPQ128_896,IVF65536_HNSW32,PQ128x8 (13Go) OPQ256_1024,IVF65536_HNSW32,PQ256x8 (30Go) and OPQ768_768,IVF262144_HNSW32,PQ768x4fsr (48Go)
And I always have 1-recall@40 = 0.82 for reasonable value of nprobe
I have trained 3 different index and every time, my 1-recall@20 are exactly the same:
INFO:autofaiss: 1-recall@20: 0.802
INFO:autofaiss: 1-recall@40: 0.824
But there is some variation in the 20-recall and 40-recall scores.
3 digits of exactitude is too much.
What do you think about it?