carriex / recomp

RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.
MIT License
95 stars 4 forks source link

pre-processed training data #9

Open minamonaa opened 2 weeks ago

minamonaa commented 2 weeks ago

I have two questions about the pre-processed training data of NQ data. How is it possible for 'has_gold_answer' to be False when 'em' is 1 and 'f1' is 1.0? What criteria were used to select 'positive_ctxs'? In QA tasks, it is mentioned that the context with the highest EM score was chosen, but how were 'positive_ctxs' set when there were multiple sentences with an EM of 1?

carriex commented 2 weeks ago

Hi there,

has_gold_answer denotes whether the retrieved documents contains the gold answer, while EM and F1 measures whether the model outputs the correct answer. It is possible that model outputs the correct answer while it is not in the retrieved documents.

When there are multiple sentences with EM equals to 1, we select the one with the highest P(gold_answer | sentence) (i.e. the sentence which leads to the highest probability of the gold answer when prepended).

Hope that helps!