AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
427 stars 23 forks source link

ValueError: max() arg is an empty sequence #35

Closed celsofranssa closed 1 year ago

celsofranssa commented 1 year ago

Hello, I'd like to determine what query is causing the following error and how to get around it:

Traceback (most recent call last):
  File "main.py", line 43, in perform_tasks
    eval(params)
  File "main.py", line 25, in eval
    eval_helper.perform_eval()
  File "/home/celso/projects/XMTC-Baselines/source/helper/EvalHelper.py", line 62, in perform_eval
    qrels = Qrels(filtered_relevance_map)
  File "/home/celso/projects/venvs/XMTC-Baselines/lib/python3.8/site-packages/ranx/data_structures/qrels.py", line 62, in __init__
    max_len = max(len(y) for x in doc_ids for y in x)
ValueError: max() arg is an empty sequence

My evaluation code is shown in the code snippet below.

ranking = self._retrieve(...)
filtered_relevance_map= {key: value for key, value in self.relevance_map.items() if key in ranking.keys()}
qrels = Qrels(filtered_relevance_map)
run = Run(ranking, name=cls)
result = evaluate(qrels, run, self.metrics, threads=12)
AmenRa commented 1 year ago

I suspect your filtered_relevance_map is not valid. Can you please post a print of it?

celsofranssa commented 1 year ago

I suspect your filtered_relevance_map is not valid. Can you please post a print of it? filtered_relevance_map: image

ranking: image

AmenRa commented 1 year ago

What the code does is simply this:

doc_ids = [list(doc.keys()) for doc in qrels.values()]
max_len = max(len(y) for x in doc_ids for y in x)

The code finds the max string length of the doc ids in the dictionary to reduce memory consumption.

You should be able to find the issue by checking what happens with your filtered_relevance_map in place of qrels.

Please, let me what you find out.

AmenRa commented 1 year ago

PS: doc variable name in the list comprehension is probably a wrong name. It should be q or query.

celsofranssa commented 1 year ago

What the code does is simply this:

doc_ids = [list(doc.keys()) for doc in qrels.values()]
max_len = max(len(y) for x in doc_ids for y in x)

The code finds the max string length of the doc ids in the dictionary to reduce memory consumption.

You should be able to find the issue by checking what happens with your filtered_relevance_map in place of qrels.

Please, let me what you find out.

I can't see any problem since I just replaced doc with label and query with text and concatenated it with their ids.

AmenRa commented 1 year ago

I can't help you without reproducible code. The best I can do is guessing. I never encountered such a problem.

The problem should lay in this line:

doc_ids = [list(doc.keys()) for doc in qrels.values()]

doc.keys() is probably empty. That should be why the next line throws the exception:

ValueError: max() arg is an empty sequence

Have you checked if those two lines of code (isolated from ranx) raise the exception?

AmenRa commented 1 year ago

Closing for inactivity.