Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

Robust04 title or desc for the query #19

Closed Albert-Ma closed 4 years ago

Albert-Ma commented 4 years ago

It's unclear that whether the title or the description of query is used for the model on Table1 performance on Robust04?

seanmacavaney commented 4 years ago

We use the title.

Albert-Ma commented 4 years ago

Thanks for clarification.

I can only got ~0.37 P@20 and ~0.42 nDCG@20. I did nothing with this code except the data. I did no word stemming or removing stop words. I just used the 5-fold you have provided on this repo. By the way I really appreciate your work so I can do a quick start on my own with your code.

So I want to know if you have tuned the model hyperparameters or other ways to process the data cause the xml docs were really messy.

seanmacavaney commented 4 years ago

It's hard to know exactly where the difference could be. Could you try using the following checkpoints with your pre-processing and let me know if it helps? https://macavaney.us/cedr-models.tar (to be moved to a more permanent location later).

I'm planning to release a version soon that incorporates all indexing/preprocessing/etc, though I'm stuck waiting for approval from my institution now, so I'm not sure when it'll be ready to be released.

Albert-Ma commented 4 years ago

thx~