NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.82k stars 898 forks source link

mAP for IR? #788

Closed datistiquo closed 3 years ago

datistiquo commented 4 years ago

Hi,

is the matchzoo mean average precision really the "real" mAP" for multiple queries? It looks for my like the calculation of just the average precision. My problem is that I want to evaluate on multiple queries. For each query I have just 1 or two "right" documents out of lets say 500. So I have to have for each query to compare with each of the 500 docs. So I have a text match with one doc and the rest of the docs should be 0.

So this blows up the test data,when you have multiple test queries and need to augment every one with all the rest documents. Is this the only way to evaluate my IR text Matching since I need to compare each query with all documents in the pool, right?

I would know howto calculate the mAP in numpy, but not to that in tensorflow? I seems there is just the average precision metric but not the mAP.

I also don't know how to use it because training in tensorflow is really different than testing it in the above way like I want.

Any ideas or help?

matthew-z commented 4 years ago

Right, you cannot simply use model.evaluate, but you may use the trained model to predict the score for each query-doc pair you have for test, and build your re-ranked list for each query.

Then you may write a MAP metric function by yourself (There are a lot of implementation to copy on github), or just output the ranked list to use TREC evaluation tools.