Open sjk0825 opened 1 month ago
Hi, thanks for the question. We used the appropriate evaluation framework for each dataset. We used passage titles for evaluation in both HotpotQA and 2WikiMultiHop since they are unique but for MuSiQue we used the entire passage since many of them share a title.
thank you for your kind answer and I have another question.
what is the passage in your paper? for example, in 2wikimultihop, Teutberga title has 2 passage in one title. than two passage share same title. than title is not unique for passage.
['Teutberga', ['Teutberga( died 11 November 875) was a queen of Lotharingia by marriage to Lothair II.', "She was a daughter of Bosonid Boso the Elder and sister of Hucbert, the lay- abbot of St. Maurice's Abbey."]]
So, passage means a concatenated passage ? or each sentence in same title?
right, for 2WikiMultiHop, we concatenate the sentences to make a passage and determine passage relevance by whether it has a supporting sentence within it.
icrot_hipporag.py include a recall program.
I have question in evaluation process about below source code. below code shows a title-level recall evaluation. (means if sp is in some title == answer title) than recall score raise.
Retrieval evaluation score is title-level in your Project?