beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

Unsupported Elastic Search distribution on BEIR.ipynb #58

Open thigm85 opened 2 years ago

thigm85 commented 2 years ago

Hi,

Thanks for the great work. BEIR is extremely valuable!

I just tried to run BEIR.ipynb on Goggle Colab and I was unable to complete "Lexical Retrieval using BM25 (Elasticsearch)" section due to an unsupported error from ElasticSearch as shown below:

Skjermbilde 2022-01-19 kl  16 28 05

I tried different versions but I couldn't get it to work. Any advice?

nreimers commented 2 years ago

This is an annoying thing about the elasticsearch client: They added a feature to the newest ES client which makes it work only with the newest ES server, to enforce the new licensing of ES.

Either use the newest ES server or downgrade your Python ES client.

thigm85 commented 2 years ago

I see, it works when I downgrade the client to 7.9.1.

Should the NDCG@10 computed in this notebook match the value of the leaderboard?

The notebook gives 0.6843 (figure below) while the leaderboard gives 0.62 for SciFact.

Skjermbilde 2022-01-19 kl  20 32 46

nreimers commented 2 years ago

I ran the script 3 times:

Run1
2021-04-20 16:10:20 - NDCG@1: 0.5400
2021-04-20 16:10:20 - NDCG@3: 0.6104
2021-04-20 16:10:20 - NDCG@5: 0.6297
2021-04-20 16:10:20 - NDCG@10: 0.6495
2021-04-20 16:10:20 - NDCG@100: 0.6749
2021-04-20 16:10:20 - NDCG@1000: 0.6850

Run2
2022-01-20 13:00:13 - NDCG@1: 0.5800
2022-01-20 13:00:13 - NDCG@3: 0.6393
2022-01-20 13:00:13 - NDCG@5: 0.6671
2022-01-20 13:00:13 - NDCG@10: 0.6914
2022-01-20 13:00:13 - NDCG@100: 0.7147
2022-01-20 13:00:13 - NDCG@1000: 0.7221

Run3
2022-01-20 13:01:23 - NDCG@1: 0.5733
2022-01-20 13:01:23 - NDCG@3: 0.6301
2022-01-20 13:01:23 - NDCG@5: 0.6594
2022-01-20 13:01:23 - NDCG@10: 0.6825
2022-01-20 13:01:23 - NDCG@100: 0.7058
2022-01-20 13:01:23 - NDCG@1000: 0.7131
2022-01-20 13:01:23 - 

Run4
2022-01-20 13:01:58 - NDCG@1: 0.5733
2022-01-20 13:01:58 - NDCG@3: 0.6301
2022-01-20 13:01:58 - NDCG@5: 0.6592
2022-01-20 13:01:58 - NDCG@10: 0.6823
2022-01-20 13:01:58 - NDCG@100: 0.7056
2022-01-20 13:01:58 - NDCG@1000: 0.7129

And got quite different results @NThakur20

I think the issue is that the index is not yet finished when retrieval starts. Is there some sleep between indexing the documents and starting to query?

Elasticsearch is indexing docs in the background, i.e. we must wait until all docs are fully indexed before we can start to query.

Another issue could be the shards. Is the ES index created with a single shard? Edit: Only one shard is created.

If I run it on an existing ES index, I get:

2022-01-20 13:07:34 - NDCG@1: 0.5767
2022-01-20 13:07:34 - NDCG@3: 0.6366
2022-01-20 13:07:34 - NDCG@5: 0.6652
2022-01-20 13:07:34 - NDCG@10: 0.6906
2022-01-20 13:07:34 - NDCG@100: 0.7134
2022-01-20 13:07:34 - NDCG@1000: 0.7212
2022-01-20 13:07:34 - 
thakur-nandan commented 2 years ago

Hi @thigm85 @nreimers,

Thanks for bringing this issue up. I debugged this and tackled the reproducibility issue that occurred. I changed the code in two places:

  1. I have added a sleep_for parameter in the ES code with a default value of 2 seconds. This will forcefully sleep the ES index after index deletion, and indexing documents.
  2. During bulk indexing (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html), there is a parameter refresh which I have set to wait_for instead of default kept at false. For more details, refer here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html.

After adding both the parameters, the code is now producing reproducible scores. I will update the development branch for now. In the next version release of BEIR, these changes will be reflected in the master branch and PyPI version (pip install beir).

Kind Regards, Nandan Thakur

nreimers commented 2 years ago

@NThakur20 Great, thanks for the quick fix.

We should also either update the examples to the newest ES version, or freeze the Python ES client to e.g. version (in setup.py):

elasticsearch==7.9.1

So that the above issues does not appear.