Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

MAP results missing #1

Open MartinLichtblau opened 5 years ago

MartinLichtblau commented 5 years ago

I can't find anything about the mean average precison of your new system (CEDR). Am I missing something or did you really not measured it? Since it's the most common evaluation metric in IR I wonder why you didn't even mention it in the paper.

Furthermore these resources could be relevant to you:

seanmacavaney commented 5 years ago

Thanks for the feedback and interest in this work. I am familiar with both of the papers you cited.

For WebTrack, we measure using ERR@20 and nDCG@20. For Robust04, we use nDCG@20 and P@20. Although commonly used for evaluation, MAP makes some unrealistic assumptions about user behavior [1], so we decided to focus on these measures which we feel adequately describe the performance of our approach (especially given the limited space in a short paper). That being said, others have asked to see MAP results [2], so we are considering including MAP results in an extended version.

Let me know if you have any other questions!

[1] Norbert Fuhr. Some Common Mistakes In IR Evaluation, And How They Can Be Avoided. ACM SIGIR Forum 2017. [2] https://twitter.com/craig_macdonald/status/1118169091955621888