Storing document embeddings index

beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

http://beir.ai

Apache License 2.0

1.55k stars 187 forks source link

Storing document embeddings index #15

Closed svakulenk0 closed 3 years ago

svakulenk0 commented 3 years ago

Is there a way to cache/load embedded documents and queries? That would help to save time on embedding big datasets such as ms marco and nq

thakur-nandan commented 3 years ago

Hi @svakulenk0,

Sadly at the moment, the only solution is to pickle save the document embeddings.

Recently, I have been working on getting faiss indexes to be integrated. That would allow caching or saving corpus embeddings as a faiss index. I can't say a fixed timeline of when this will be completely integrated into the repo, but I will let you know once it is done.

Kind Regards, Nandan

svakulenk0 commented 3 years ago

Hi Nandan, thank you for the reply! I love the library :)

thakur-nandan commented 3 years ago

Hi @svakulenk0,

Update: In the latest version of the BEIR package, now you can save/load the corpus embeddings as a faiss index. Check out: https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_faiss_dense.py

Kind Regards, Nandan

svakulenk0 commented 3 years ago

nice!!! thank you

rahmanidashti commented 1 year ago

Is there a way to cache/load embedded documents and queries? That would help to save time on embedding big datasets such as ms marco and nq

Thanks Svitlana for this question and Nandan for providing this feature, it is beneficial!

rahmanidashti commented 1 year ago

nice!!! thank you

Hi @svakulenk0, have you tried this?

rahmanidashti commented 1 year ago

Hi @svakulenk0,

Update: In the latest version of the BEIR package, now you can save/load the corpus embeddings as a faiss index. Check out: https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_faiss_dense.py

Kind Regards, Nandan

Hi @thakur-nandan, thank you for this. Can you please give me an example of how to store and load the embeddings?