CosimoRulli / emvb

Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024
53 stars 2 forks source link

Efficient Multi-Vector Retrieval with Bit Vectors (EMVB)

This repo contains the code and instructions on how to reproduce the results of the ECIR 2024 paper: Franco Maria Nardini, Cosimo Rulli, Rossano Venturini. "Efficient Multi-vector Dense Retrieval with Bit Vectors." European Conference on Information Retrieval. 2024.

Requirements

As our code heavily relies on AVX512 instructions, to run it you need a CPU with available AVX512 instructions.

Installation

Parameters

By running ./build/perf_embv --help you can see the possible arguments to pass to the script.

Reproducing Paper Results

Make sure to execute export OMP_NUM_THREADS=1 before running a script, otherwise faiss and MKL may run in multithread mode. In this case, intra-query parallelism is not advantageous compared to single-thread execution. In case one wants to parallelize, it would be worthed to parallelize over the queries.

We provide the parameters configurations to reproduce the results of Table 1 and Table 2 in the scripts results_msmarco.sh and results_lotte.sh. Modify the script to provide the path to the downloaded indexes. The scripts to compute the metrics are taken from the ColBERT original repo.

The indexes can be downloaded here here. They have the following name pattern {n_centroids}k_{M}_m_{dataset}_{compression_mod}.tar.gz

Extend Results on Different Collections

To run our index on your collection, you need to provide the doclens file, the query_ids file, and the index directory.

The index directory contains the following fields.

The script ConvertFaissIndex.ipynb contains some instruction to prepare the data in order to run EMVB on a different collection.

Citation License

The source code in this repository is subject to the following citation license:

By downloading and using this software, you agree to cite the undernoted paper in any kind of material you produce where it was used to conduct search or experimentation, whether be it a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation licence.

Efficient Multi-vector Dense Retrieval with Bit Vectors

@inproceedings{emvb_ecir2024,
  title={Efficient Multi-vector Dense Retrieval with Bit Vectors},
  author={Nardini, Franco Maria and Rulli, Cosimo and Venturini, Rossano},
  booktitle={European Conference on Information Retrieval},
  pages={3--17},
  year={2024},
  organization={Springer}
}