castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.02k stars 449 forks source link

MAP for ES regression on MS MARCO doc slightly different from Lucene values #1173

Closed lintool closed 4 years ago

lintool commented 4 years ago

Follow up to #1167

I ran this:

$ python src/main/python/run_es_regression.py --regression msmarco-doc --input /tuna1/collections/msmarco/doc
...
2020-05-08 14:34:32,376 INFO - [SUCESS] 0.2308 MAP verified as expected!

In the end I got .2308 MAP, but that doesn't actually match the .2310 MAP reported here: https://github.com/castorini/anserini/blob/master/docs/regressions-msmarco-doc.md

@HangCui0510 @eiston

HangCui0510 commented 4 years ago

I ran this: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md

Instead of 0.2310 or 0.2308, I got:

Screen Shot 2020-05-09 at 12 30 50 AM
lintool commented 4 years ago

@HangCui0510 hrm... can you debug and see what's going on?

HangCui0510 commented 4 years ago

okay

HangCui0510 commented 4 years ago

I think the problem is in msmarco-doc config. I am making adjustments on mapping parameters to see what's going on. It takes a very long time for each run.

HangCui0510 commented 4 years ago

@eiston Can you please help me with this? I have tried different mappings. they all give the same result 0.2308. I compared output run files among Solrini, Elastirini and Anserini, solrini and anserini are exactly the same, elastirini has the same doc ranking but double the scores.

lintool commented 4 years ago

@HangCui0510 Are the rankings (ordering of the documents) the same? Are the differences only isolated to a few different queries?

eiston commented 4 years ago

@HangCui0510 I am looking into this

HangCui0510 commented 4 years ago

@lintool about 1000 out of 5000000 are different on retrieval with topics.msmarco-doc.dev.txt

HangCui0510 commented 4 years ago

@lintool the differences are on many topics but only 2-3 rankings are different on these topics.

HangCui0510 commented 4 years ago

@lintool Oh, I think I know what the problem is. Please refer to the picture, it's the rounding that makes differences

Screen Shot 2020-05-15 at 8 05 23 PM
lintool commented 4 years ago

Ah, I see. The rounding causes scoring ties, and that changes the map ever so slightly. I wonder why this same issue doesn't affect Solr? There, we can replicate results exactly, right?

HangCui0510 commented 4 years ago

right, it seems like solr and anserini have the same score scales, they have the same ranking score values. Elastirini's score is higher, nearly double.

HangCui0510 commented 4 years ago

@lintool Is this problem fixable? I noticed that elasticssearch with msmarco-passage also has a different value (slightly different) from lucene regression

Screen Shot 2020-05-17 at 2 47 38 AM Screen Shot 2020-05-17 at 2 51 02 AM
lintool commented 4 years ago

I don't think there's much we can do about this issue. Good to document, though, for future reference. Closing.