beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

a question about how to use BM25 in evaluate_custom_dataset. #33

Closed Graduo closed 3 years ago

Graduo commented 3 years ago

Hi~ I try use BM25 model to evaluate custom dataset , when I use the code as follow:

#### Sentence-Transformer ####
#### Provide any pretrained sentence-transformers model path
#### Complete list - https://www.sbert.net/docs/pretrained_models.html
# model = DRES(models.SentenceBERT("msmarco-distilbert-base-v3"))
model = BM25(index_name="your-index-name", hostname="127.0.0.1:9200", initialize=True )
# retriever = EvaluateRetrieval(model, score_function="cos_sim")
retriever = EvaluateRetrieval(model)
#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

#### Evaluate your retrieval using NDCG@k, MAP@K ...
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

but it doesn't work

2021-08-04 17:57:32 - Activating Elasticsearch....
2021-08-04 17:57:32 - Elastic Search Credentials: {'hostname': '127.0.0.1:9200', 'index_name': 'your-index-name', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
english
2021-08-04 17:57:32 - Deleting previous Elasticsearch-Index named - your-index-name
2021-08-04 17:57:32 - Creating fresh Elasticsearch-Index named - your-index-name
  0%|                                                                                                                   | 0/2 [00:00<?, ?docs/s]
que: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 26.55it/s]
Traceback (most recent call last):
  File "/.../evaluate_custom_dataset.py", line 67, in <module>
    ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
  File "/.../beir/beir/retrieval/evaluation.py", line 74, in evaluate
    ndcg[f"NDCG@{k}"] = round(ndcg[f"NDCG@{k}"]/len(scores), 5)
ZeroDivisionError: float division by zero

Any other code need I modify? thank you~

thakur-nandan commented 3 years ago

Hi @Graduo,

Could you check whether are you getting scores in the results dictionary or not? I believe the results dictionary must have an error here. If not, could you check whether there is a mismatch between qrels and results?

Hope it helps!

Graduo commented 3 years ago

Hi @Graduo,

Could you check whether are you getting scores in the results dictionary or not? I believe the results dictionary must have an error here. If not, could you check whether there is a mismatch between qrels and results?

Hope it helps!

hi thanks for your advice I 'll try debug it

Graduo commented 3 years ago

There is another question about custom dataset confuse me a lot.When I use my own data ,there is a bug ,and I have no idea about my mistake

2021-08-05 11:44:23 - Loading Corpus...
100%|██████████████████████████████████████████████████████████████████████████████████| 44972/44972 [00:00<00:00, 81953.51it/s]
2021-08-05 11:44:23 - Loaded 44972 TEST Documents.
2021-08-05 11:44:23 - Doc Example: {'text': '一定要告诉他你很喜欢他 很爱他!!  虽然不知道你和他现在的关系是什么!但如果真的觉得很喜欢就向他表白啊!!起码你努力过了!  女生主动多少占一点优势的!!呵呵  只愿曾经拥有!  到以后就算感情没现在这么强烈了也不会觉得遗憾啊~!  与其每天那么痛苦的想他 恋他 还不如直接告诉他 !  不要怕回破坏你们现有的感情!因为如果不告诉他  你可能回后悔一辈子!!  ', 'title': '请问深入骨髓地喜欢一个人怎么办我不能确定对方是不是喜欢我,我却想 '}
2021-08-05 11:44:23 - Loading Queries...
2021-08-05 11:44:24 - Loaded 36361 TEST Queries.
2021-08-05 11:44:24 - Query Example: 我不能确定对方是不是喜欢我,我却想分分秒秒跟他在一起,有谁能告诉我如何能想他少一点
2021-08-05 11:44:24 - Activating Elasticsearch....
2021-08-05 11:44:24 - Elastic Search Credentials: {'hostname': '127.0.0.1:9200', 'index_name': 'your-index-name', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'cjk'}
cjk
2021-08-05 11:44:24 - Deleting previous Elasticsearch-Index named - your-index-name
2021-08-05 11:44:24 - Creating fresh Elasticsearch-Index named - your-index-name
  0%|                                                                                               | 0/44972 [00:00<?, ?docs/s]
que:   0%|                                                                                              | 0/285 [00:02<?, ?it/s]
Traceback (most recent call last):
  File ".../beir/examples/retrieval/evaluation/lexical/evaluate_custom_bm25.py", line 81, in <module>
    results = retriever.retrieve(corpus, queries)
  File ".../beir/beir/retrieval/evaluation.py", line 23, in retrieve
    return self.retriever.search(corpus, queries, self.top_k, self.score_function, **kwargs)
  File ".../beir/beir/retrieval/search/lexical/bm25_search.py", line 47, in search
    top_hits=top_k + 1) # Add 1 extra if query is present with documents
  File ".../beir/beir/retrieval/search/lexical/elastic_search.py", line 194, in lexical_multisearch
    responses = resp["hits"]["hits"][skip:]
KeyError: 'hits'

and my dataset:

#corpus.jonl
{"_id": "qid_1815059893214501395", "title": "请问深入骨髓地喜欢一个人怎么办我不能确定对方是不是喜欢我,我却想 ", "text": "一定要告诉他你很喜欢他 很爱他!!  虽然不知道你和他现在的关系是什么!但如果真的觉得很喜欢就向他表白啊!!起码你努力过了!  女生主动多少占一点优势的!!呵呵  只愿曾经拥有!  到以后就算感情没现在这么强烈了也不会觉得遗憾啊~!  与其每天那么痛苦的想他 恋他 还不如直接告诉他 !  不要怕回破坏你们现有的感情!因为如果不告诉他  你可能回后悔一辈子!!  "}
{"_id": "qid_2063849676113062517", "title": "我登陆诛仙2时总说我账号密码错误,但是我打的是正确的,就算不对我? ", "text": "被盗号了~我的号在22号那天被盗了,跟你一样情况,link密码与账号错误,我密保都有了呐,邮箱换密码也不行,还被删了号,伤心兼郁闷,呵呵,盗号了。建议跟完美申请把号要回来,或者玩新的号!"}
{"_id": "qid_6625582808814915192", "title": "斩魔仙者称号怎么得来的 ", "text": "楼主您好,以下为转载:\r\r圣诞前热身 来《生肖传说》做斩魔仙者\r\r  一年一度的圣诞节快要来临了,大街小巷商户们都在忙着准备12月25日圣诞的来临。而这时候,一些妖魔也正蠢蠢欲动准备作乱。作为生肖世界肩负维护世界和平、拯救全人类的生肖使者,怎么能不有所行动,为了生肖世界的安定而做防范准备?!\r\r  要让妖魔鬼怪能对你有所心悸,除了自己本身武艺要高强,最好能在妖魔界打出知名度,这样,当你的亲朋好友被妖魔袭击时,只要爆出你的名号,这些妖魔上就会落荒而逃,岂不好哉?那么,“斩魔仙者”这个响亮的称号应该足够能震慑住妖魔,让他们铭记在心了吧!\r\r斩魔仙者的称号\r\r  而且,这个“斩魔仙者”的称号并不是人人都能得到的。只有成功挑战70级副本中的隐藏BOSS“羽翼仙”的人才能获得此称号!并且前提条件是在12月18日~12月25日之间第一队成功挑战羽翼仙的人才能获此称号!因此,此称号在全服范围内,是绝对不可能超过5个的!\r\r  要挑战羽翼仙可不是一件容易的事。首先,要在70级副本中打败4个强大的BOSS!在打完副本的第4个BOSS有一定几率获得道具“羽翼真元”,有了羽翼真元后就可以与羽翼仙进行一场战斗。羽翼仙就站在第4个BOSS的旁边,只是没有道具是不能进入战斗的。\r\r羽翼仙\r\r  在12月18日~12月25日活动期间成功挑战羽翼仙后的第一支队伍就可以获得兑换“斩魔仙者”的道具——烈火珍珠旗。当然,如果你在这场激烈的战斗中不幸捐躯,那么当然是不会得到这个道具的。得到了这把“烈火珍珠旗”的玩家就可以到NPC燃烧使处兑换称号了!\r\r  这样兼具高强能力和超强人品才能获得的称号,怎么能不人望而生畏,怎么能不让那些妖魔胆怯?想要获得的玩家就快快行动,莫要让人先抢了这全服唯一的“斩魔仙者”称号!\r\r如果满意,请采纳。\r谢谢~"}
...
#queries.jonl
{"_id": "qqid_1815059893214501395", "text": "我不能确定对方是不是喜欢我,我却想分分秒秒跟他在一起,有谁能告诉我如何能想他少一点"}
{"_id": "qqid_6625582808814915192", "text": "斩魔仙者称号怎么得来的"}
{"_id": "qqid_9204493405205415849", "text": "多谢了"}
...
#qrels/test.tsv
query-id    corpus-id   score
qqid_1815059893214501395    qid_1815059893214501395 1
qqid_6625582808814915192    qid_6625582808814915192 1
...

Could I get some your advice about it? Thank you~

thakur-nandan commented 3 years ago

Hi @Graduo,

So your data format looks accurate to me.

Could you try searching for results from a single query using Elasticsearch with this snippet below:

corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")

#### Lexical Retrieval using Bm25 (Elasticsearch) ####
#### Provide a hostname (localhost) to connect to ES instance
#### Define a new index name or use an already existing one.
#### We use default ES settings for retrieval
#### https://www.elastic.co/

hostname = "your-hostname" #localhost
index_name = "your-index-name" # germanquad

#### Intialize #### 
# (1) True - Delete existing index and re-index all documents from scratch 
# (2) False - Load existing index
initialize = True # False

#### Language ####
# For languages supported by Elasticsearch by default, check here ->
# https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
language = "cjk" # Please provide full names in lowercase for eg. english, hindi ...

#### Sharding ####
# (1) For datasets with small corpus (datasets ~ < 5k docs) => limit shards = 1 
number_of_shards = 1
model = BM25(index_name=index_name, hostname=hostname, language=language, initialize=initialize, number_of_shards=number_of_shards)

# (2) For datasets with big corpus ==> keep default configuration
# model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)
retriever = EvaluateRetrieval(model)

#### Indexing complete corpus ####
retriever.retriever.index(corpus)

#### Try single query ####
query_text = "我不能确定对方是不是喜欢我,我却想分分秒秒跟他在一起,有谁能告诉我如何能想他少一点"
hits = retriever.retriever.es.es.search(index = index_name, body = {"query" : {"multi_match": {
          "query": query_text, "fields": ["title", "txt"]}}}, size = 10)

Could you check what output do you receive in hits with the following code?

Graduo commented 3 years ago

OK ,and it work well! I print the hits : ( I replace some text with ... to display )

{'took': 2832, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': 79.93321, 'hits': [{'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_1815059893214501395', '_score': 79.93321, '_source': {'txt': '一定要告诉他...一辈子!!  ', 'title': '请问...想 '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_1736424441895934891', '_score': 38.00187, '_source': {'txt': '我...其 '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_9185497216739603038', '_score': 35.873653, '_source': {'txt': '你...呢???', 'title': '他...都 '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_4521387489437650029', '_score': 35.030113, '_source': {'txt': '哈哈..!       ', 'title': '他...学, '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_5171036957323609976', '_score': 34.074993, '_source': {'txt': '其实...聊!', 'title': '一...她, '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_1689816759806151278', '_score': 32.647408, '_source': {'txt': '我...通? '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_2779686267444272462', '_score': 32.29292, '_source': {'txt': '你...了.', 'title': '我...很 '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_8802910174884054297', '_score': 31.973412, '_source': {'txt': '你都...福!', 'title': '怎么...纯, '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_8060703573863218240', '_score': 31.59405, '_source': {'txt': '我...爱。', 'title': '网...点 '}}, {'_index': 'your-index-name', '_type': '_doc', '_id': 'dqid_6590546566239633226', '_score': 30.612078, '_source': {'txt': '发...。。', 'title': '什...里 '}}]}}

The problem may be in the process of loading “que” . The corpus index well:

2021-08-05 15:19:53 - Elastic Search Credentials: {'hostname': '127.0.0.1:9200', 'index_name': 'your-index-name', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'cjk'}
cjk
2021-08-05 15:19:53 - Deleting previous Elasticsearch-Index named - your-index-name
2021-08-05 15:19:53 - Creating fresh Elasticsearch-Index named - your-index-name
 29%|██████████████████████▎                                                      | 13001/44972 [00:14<00:36, 884.16docs/s]

but the que can't load

2021-08-05 11:30:49 - Deleting previous Elasticsearch-Index named - your-index-name
2021-08-05 11:30:49 - Creating fresh Elasticsearch-Index named - your-index-name
  0%|                                                                                             | 0/44972 [00:00<?, ?docs/s]
que:   0%|                                                                                            | 0/285 [00:03<?, ?it/s]
thakur-nandan commented 3 years ago

Thanks for trying out the snippet and sharing the results, and elastic search is working as expected.

I suspect that you have lots of queries with possibly no similar documents?

So, you can do either one of the two things below:

  1. You can share a working example of your code, which will help me debug better.
  2. If not, could you add print(resp) before 194 in .../beir/beir/retrieval/search/lexical/elastic_search.py and share what you get? This file would be present within beir source env.

Kind Regards, Nandan

Graduo commented 3 years ago

Yes, your guess is correct! I replace the queries with the title of corpus , and it work as expected.When lots of queries with possibly no similar documents(it may be related to the quality of the dataset , I still need to find more exact details about it), the above bug will appear. And if you need my code and data to help you debug better, I’m glad to organize them later and send the email to you.Thanks for your patiently reply and your awesome work again!