erhant / halo2-vectordb

Verifiable vector similarity queries PoC with Halo2.
MIT License
6 stars 0 forks source link
halo2 plonk rust vector vector-database vector-similarity

Halo2 VectorDB

Verifiable vector similarity queries over a committed vector database.

This projects aims to obtain a proof-of-concept for a verifiable vector database using zero-knowledge proofs. We make heavy use of the awesome ZKFixedPointChip which enables fixed-point arithmetic with halo2-lib.

Installation

You need Rust installed:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Then, you can clone this repository and use the chips inside it:

git clone https://github.com/erhant/halo2-vectordb.git
cd halo2-vectordb

Usage

We implement two chips, one for distance metrics in halo2, and the other for basic vector database operations.

DistanceChip

DistanceChip provides distance metrics that operate on two vectors of equal length. The vector elements are expected to be quantized with the FixedPointChip. The following distance metrics are implemented:

VectorDBChip

VectorDBChip implements basic vector database functionality over a set of vectors. Similar to DistanceChip, it requires a FixedPointChip to operate over quantized values. It exposes the following functions:

We also have a trait FixedPointVectorInstructions and its implementation for the FixedPointChip, which are simple utility functions to quantize and dequantize vectors.

Demonstration

A demonstrative test suite can be found at demo_test:

Examples

Run the examples via one of the following:

# demonstrate distance computations
LOOKUP_BITS=12 cargo run --example distances -- \
  --name distances -k 13 mock

# example merkle commitment to vectors
LOOKUP_BITS=12 cargo run --example merkle -- \
  --name merkle -k 13 mock

# exhaustively find the similar vector & commit to the database
LOOKUP_BITS=12 cargo run --example query -- \
  --name query -k 13 mock

# compute centroids
LOOKUP_BITS=15 cargo run --example kmeans -- \
  --name kmeans -k 16 mock

You can provide a specific input via the --input <input-name> option.

Testing

To run tests:

cargo test

Some of the tests make use of the ANN_SIFT_10K dataset by Jégou et al. which can be downloaded at Corpus-Texmex. This dataset 128-dimensional vectors. Within our tests folder, the common module exposes two functions to read these vectors:

Acknowledgements