ankane / faiss-ruby

Efficient similarity search and clustering for Ruby
MIT License
128 stars 5 forks source link
ann approximate-nearest-neighbors kmeans pca

Faiss Ruby

Faiss - efficient similarity search and clustering - for Ruby

Learn more about Faiss

Build Status

Installation

First, ensure BLAS, LAPACK, and OpenMP are installed. For Mac, use:

brew install libomp

For Ubuntu, use:

sudo apt-get install libblas-dev liblapack-dev

Then add this line to your application’s Gemfile:

gem "faiss"

It can take a few minutes to compile the gem. Windows is not currently supported.

Getting Started

Prep your data

objects = [
  [1, 1, 2, 1],
  [5, 4, 6, 5],
  [1, 2, 1, 2]
]

Build an index

index = Faiss::IndexFlatL2.new(4)
index.add(objects)

Search

distances, ids = index.search(objects, 3)

Save an index

index.save("index.bin")

Load an index

index = Faiss::Index.load("index.bin")

Use Faiss::IndexBinary to load binary indexes

Basic Indexes

Exact search for L2

Faiss::IndexFlatL2.new(d)

Exact search for inner product

Faiss::IndexFlatIP.new(d)

Hierarchical navigable small world graph exploration

Faiss::IndexHNSWFlat.new(d, m)

Inverted file with exact post-verification

Faiss::IndexIVFFlat.new(quantizer, d, nlists)

Locality-sensitive hashing

Faiss::IndexLSH.new(d, nbits)

Scalar quantizer (SQ) in flat mode

Faiss::IndexScalarQuantizer.new(d, qtype)

Product quantizer (PQ) in flat mode

Faiss::IndexPQ.new(d, m, nbits)

IVF and scalar quantizer

Faiss::IndexIVFScalarQuantizer.new(quantizer, d, nlists, qtype)

IVFADC (coarse quantizer+PQ on residuals)

Faiss::IndexIVFPQ.new(quantizer, d, nlists, m, nbits)

IVFADC+R (same as IVFADC with re-ranking based on codes)

Faiss::IndexIVFPQR.new(quantizer, d, nlists, m, nbits, m_refine, nbits_refine)

Binary Indexes

Index binary vectors

Faiss::IndexBinaryFlat.new(d)

Speed up search with an inverse vector file

Faiss::IndexBinaryIVF.new(quantizer, d, nlists)

K-means Clustering

Train

kmeans = Faiss::Kmeans.new(4, 2)
kmeans.train(objects)

Get the centroids

kmeans.centroids

PCA

Train

mat = Faiss::PCAMatrix.new(40, 10)
mat.train(objects)

Apply

mat.apply(mt)

Product Quantizer

Train

pq = Faiss::ProductQuantizer.new(32, 4, 8)
pq.train(objects)

Encode

pq.compute_codes(objects)

Decode

pq.decode(codes)

Save a quantizer

pq.save("pq.bin")

Load a quantizer

pq = Faiss::ProductQuantizer.load("pq.bin")

Data

Data can be an array of arrays

[[1, 2, 3], [4, 5, 6]]

Or a Numo array

Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone --recursive https://github.com/ankane/faiss-ruby.git
cd faiss-ruby
bundle install
bundle exec rake compile
bundle exec rake test