Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

CEDR for MARCO document ranking #44

Open caiyinqiong opened 2 years ago

caiyinqiong commented 2 years ago

Hello, have you ever run CEDR_KNRM on MSMARCO document ranking task? I encountered some problems when I trained CEDR_KNRM initialized with the fine-tuned BERT (the performance almost no longer increases or even decreases). I wonder if it's because the training settings on robust are not suitable for MARCO?

Look forward to some empirical guidance. Thank you.

seanmacavaney commented 2 years ago

I don't recall trying it, but in PARADE we identified some weirdness about the document ranking task that may explain what you're seeing. The dataset has a strong bias towards a "maximum passage", which means that more sophisticated aggregation techniques (perhaps like the KNRM aggregator employed by CEDR-KRNM) are less effective than simply taking a maximum passage score over the document. See Section 4.6 and Table 4 of the paper.

Hope this helps!