facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.44k stars 3.64k forks source link

polysemous_ht resets to 0 when loading from disk #2120

Open mhendrey opened 2 years ago

mhendrey commented 2 years ago

Summary

I run a hyperparameter sweep for an IVF,PQ index which provides optimal values for nprobe,ht. I then set those parameters for the index and save it to disk. When I read from index, the nprobe is still the same, but the ht has been reset to 0.

Platform

OS: Ubuntu 20.04.3 LTS

Faiss version: 1.7.1

Installed from: anaconda, pytorch channel

Running on:

Interface:

Reproduction instructions

import faiss

n = 10000 d = 64 nlist = 100 X = faiss.randn((n, d))

index = faiss.index_factory(d, f"IVF{nlist},PQ8") print(f"{index.do_polysemous_training=:}") print(f"{index.by_residual=:}") index.train(X) index.add(X)

params = faiss.ParameterSpace() params.set_index_parameters(index, "nprobe=10,ht=32") faiss.write_index(index, "testing.index")

index2 = faiss.read_index("testing.index") assert index.nprobe == index2.nprobe, f"{index.nprobe=:}, {index2.nprobe=:}" assert index.polysemous_ht == index2.polysemous_ht, f"{index.polysemous_ht=:}, {index2.polysemous_ht=:}"

""" index.do_polysemous_training=True index.by_residual=True Traceback (most recent call last): File "faiss_bug.py", line 21, in assert index.polysemous_ht == index2.polysemous_ht, \ AssertionError: index.polysemous_ht=32, index2.polysemous_ht=0 """

mdouze commented 2 years ago

yes right, there is an inconsistency because some search-time parameters are stored (nprobe) and others are not (max_codes, polysemous_ht). Will mark this as an enhancement, so that we can implement it when we change the storage format once more.

mhendrey commented 2 years ago

Thanks for responding. Would you like me to close the issue or leave it open?

mdouze commented 2 years ago

please leave open, that's the flow for enhancements...