facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
31.43k stars 3.64k forks source link

How to obain codebook from a saved OPQ index? #3536

Closed Xp-speit2018 closed 4 months ago

Xp-speit2018 commented 4 months ago

Summary

Platform

OS: Ubuntu 20.04.6 LTS

Faiss version: 1.8.0

Installed from: conda

Faiss compilation options:

Running on:

Interface:

Reproduction instructions

Here's how I create an OPQ index and inspect its centroids: ```python import faiss import numpy as np from faiss.contrib.inspect_tools import get_pq_centroids import os d = 128 M = 32 nbits = 8 seed = 42 np.random.seed(seed) opq_index = faiss.IndexPreTransform( faiss.OPQMatrix(d, M), faiss.IndexPQ(d, M, nbits, faiss.METRIC_INNER_PRODUCT) ) train = np.random.rand(10000, d).astype('float32') opq_index.train(train) cent = get_pq_centroids(opq_index.referenced_objects[-1].pq) print(cent.shape) # M * 2^bits * sub_dim print(cent) ``` It works well, as `referenced_objects` of `IndexPreTransform` lists the transformations and the index it holds: ```python print(opq_index.referenced_objects) ``` ```text [ >, >] ``` However when I save the index and load it back, the inspection no longer works as the index no longer has a `referenced_objects` field: ```python faiss.write_index(opq_index, 'codebook_hack.opq') opq_index = faiss.read_index('codebook_hack.opq') print(opq_index.__dict__) ``` ```text {'this': } ``` But the `add` function still works so I think the deserialized `opq_index` is still holding the centroids. Is there an alternative way to obtain it even after deserialization?
Xp-speit2018 commented 4 months ago

I found a solution:

def get_opq_cent(opq):
    pq_index = faiss.downcast_index(opq.index)
    return get_pq_centroids(pq_index.pq)

get_opq_cent(opq_index)