Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

Run files of Vanilla BERT checkpoints do not match test folds in data/robust #21

Open krasserm opened 4 years ago

krasserm commented 4 years ago

First of all, thanks a lot for your interesting work on CEDR and for the code in this repository.

I downloaded the Vanilla BERT and CEDR-KNRM checkpoints from #18 and checked the query ids in the .run files contained in the downloaded archive. While the sets of query ids in cedrknrm-robust-f[1-5].run match those in data/robust/f[1-5].test.run, the sets of query ids in vbert-robust-f[1-5].run do not match those in data/robust/f[1-5].test.run (e.g. the set of query ids in vbert-robust-f1.run is different from the set of query ids in data/robust/f1.test.run, and also cedrknrm-robust-f1.run).

Why are the folds for Vanilla BERT and CEDR-KNRM different? On which folds have the Vanilla BERT checkpoints been trained/validated? Given that the test folds of the Vanilla BERT and CEDR-KNRM checkpoints are different I assume that the provided Vanilla BERT checkpoints have not been used as initial weights for obtaining the provided CEDR-KNRM checkpoints. Is this assumption correct? If yes, which Vanilla BERT checkpoints have been used to initialize CEDR-KNRM training? Do you mind sharing these checkpoints too?

I'm currently investigate issues reproducing the results published in the paper. More on that in a separate ticket ...

krasserm commented 4 years ago

To be more precise regarding

e.g. the set of query ids in vbert-robust-f1.run is different from the set of query ids in data/robust/f1.test.run, and also cedrknrm-robust-f1.run

the number of common query ids in vbert-robust-f1.run and data/robust/f{x}.test.run for x = 1..5 is:

seanmacavaney commented 4 years ago

Hi Marin,

Thanks for pointing out this inconsistency! I suspect that it can be explained by a mismatch between the original code used for running the experiments (which reflect the vbert-robust-f1.run files), and the simplified example we released here. Specifically, I'm thinking it may have been a problem with the code that exported the data/robust/f{x}.test.run files from the original source. But I'll need to spend some time digging into exactly what happened.