beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

ValueError when running `evaluate_bm25.py` #64

Open jordane95 opened 2 years ago

jordane95 commented 2 years ago

Hi, I was trying to run your evaluate_bm25.py baseline, but I got the following error. There may be some problem with elasticsearch. Could you please help me fix it?

2022-02-17 02:38:34 - Loading Queries...
2022-02-17 02:38:34 - Loaded 300 TEST Queries.
2022-02-17 02:38:34 - Query Example: 0-dimensional biomaterials show inductive properties.
2022-02-17 02:38:34 - Activating Elasticsearch....
2022-02-17 02:38:34 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'scifact', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 1, 'language': 'english'}
Traceback (most recent call last):
  File "evaluate_bm25.py", line 64, in <module>
    model = BM25(index_name=index_name, hostname=hostname, initialize=initialize, number_of_shards=number_of_shards)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/bm25_search.py", line 22, in __init__
    self.es = ElasticSearch(self.config)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/beir/retrieval/search/lexical/elastic_search.py", line 34, in __init__
    self.es = Elasticsearch(
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/__init__.py", line 312, in __init__
    node_configs = client_node_configs(
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 101, in client_node_configs
    node_configs = hosts_to_node_configs(hosts)
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elasticsearch/_sync/client/utils.py", line 141, in hosts_to_node_configs
    node_configs.append(url_to_node_config(host))
  File "/anaconda/envs/beir/lib/python3.8/site-packages/elastic_transport/client_utils.py", line 198, in url_to_node_config
    raise ValueError(
ValueError: URL must include a 'scheme', 'host', and 'port' component (ie 'https://localhost:9200')
nreimers commented 2 years ago

As hostname I think you must use http://localhost (or http://localhost:9200), not just localhost

jordane95 commented 2 years ago

Thank you so much! I change the hostname to http://localhost:9200 and it works. But when I run it to evaluate BM25, I get different scores at different runs. For example, the NDCG@10 score ranges from 0.64~0.67 on scifact dataset. Do you know why? Is there any randomness in the BM25 algorithm?

nreimers commented 2 years ago

This was addressed in https://github.com/UKPLab/beir/issues/58

Not sure if the latest release already includes this. You can either update BEIR to use the latest version from the GIT. Or you add a sleep after you index the documents in your code.

jordane95 commented 2 years ago

I see. It's fixed in the beir code but not yet included in the examples. I add a sleep time and eventually get a consistent score.

thakur-nandan commented 2 years ago

Hi @jordane95,

Yes soon with our next pip update, hopefully, this should not be an issue anymore and consistent scores should be visible with Elasticsearch BM25. Thanks for notifying me!

Kind regards, Nandan Thakur