AmenRa / retriv

A Python Search Engine for Humans 🥸
MIT License
174 stars 20 forks source link

Qrels and Run query ids do not match #12

Closed celsofranssa closed 1 year ago

celsofranssa commented 1 year ago
Trial 0 failed with parameters: {'b': 0.37, 'k1': 9.600000000000001} because of the following error: 
AssertionError('Qrels and Run query ids do not match').

I guess this issue happens because in qrels, there are not all possible scores for all possible results in the run. Wouldn't it be interesting to filter the run dictionary for only the evaluated cases that occur in qrels ?

AmenRa commented 1 year ago

I think it's better if users do that on their own so that they are aware of what's happening. Also, if you do not that upfront you will end up slowing down the optimisation process, as you will search also for queries for which you do not have qrels.

celsofranssa commented 1 year ago

However, how to integrate this filter during autotune?

Even during autotune, since the retrieve results depend on the queries and the training collection, it is unlikely that qrels contains all possible keys from run (although ideally, it will have an intersection with the run).

AmenRa commented 1 year ago

You don't need the true relevance value for each query-doc tuple. As the error says, the query ids do not match. Meaning that the provided qrels has more/less/different query ids than the queries for which the run was computed.

celsofranssa commented 1 year ago

I see. Since it is costly to obtain relevance feedback, I only have it for a general case. So when I try to autotune the retriever for different slices of data, there is no way to guarantee that the keys in qrels and run are identical, only that they intersect.

Anyway, thank you.

AmenRa commented 1 year ago

Well, can't you extract the intersection before tuning?

AmenRa commented 1 year ago

Closing for inactivity.