easonnie / semanticRetrievalMRS

This is the repo for the paper "Revealing the Importance of Semantic Retrieval for Machine Reading at Scale".
MIT License
59 stars 11 forks source link

term-based IR result #3

Closed michaelmoju closed 4 years ago

michaelmoju commented 4 years ago

Thanks for your contribution. I am wondering how can I get the intermediate term-based IR retrieval data? From the readme in your paragraph-level data, I assume the term-based results should be in data/processed/content_selection_forward/hotpot_dev_p_level_unlabeled.jsonl?

easonnie commented 4 years ago

Sorry about the confusion. The term-based IR results are actually contained in the data/p_hotpotqa/hotpotqa_paragraph_level/*.jsonl files. The qid field can be mapped to the original id in data/hotpotqa/*.json files. If you can group each json line by qid, you can get the term-based IR results for each query in hotpotqa. The BERT model only give score for each term-based retrieved item. We will update the README later today regarding this issue.

easonnie commented 4 years ago

Let me know if you have any other confusion or you can close this issue.

michaelmoju commented 4 years ago

You have totally answered my question. Thank you!