CosimoRulli / emvb

Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024
53 stars 2 forks source link

Please provide the scripts to build and convert indexes on a custom collection #2

Open bianzheng123 opened 4 weeks ago

CosimoRulli commented 2 weeks ago

Hey @bianzheng123,

thank you for your interest. I have added this script, let me know if you need any further help!

bianzheng123 commented 1 week ago

Hi, the EMVB method has a lot of parameters for tuning, and the paper shows a little about the parameter configuration. Could you please show the parameters used in the experiment?

CosimoRulli commented 3 days ago

Hey @bianzheng123,

At building time, the number of centroids is $2^{18}$. We tested different values of $M$, as reported in Tables 1 and 2. The number of bits per subspace in PQ (nbits in the notebook) is always 8.

The bash scripts results_lotte.sh and results_msmarco.shcontain the best configuration we discovered for the querying parameters. As they depend on the dataset you are using, I recommend running a grid search once you have built the index.