PaddlePaddle / RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
Apache License 2.0
767 stars 128 forks source link

Titles in MS MARCO Passage Ranking #11

Closed okhat closed 2 years ago

okhat commented 2 years ago

Hi! Thanks for the great work and the helpful code release.

Does RocketQA[v2] use titles for MS MARCO Passage Ranking? That's my understanding of the released research/* scripts.

It might be helpful to document how the titles are obtained and whether they have a significant impact on quality. Perhaps you obtained the titles from the MS MARCO Document Ranking task?

ReyonRen commented 2 years ago

Hi @okhat,

Yes, we use titles of passages for MS MARCO Passage Ranking.

Specifically, for each passage, we link it to the corresponding document in the MS MARCO Document set and use the document title as the title of the passage.

Here is the link to download our processed MS MARCO dataset: https://rocketqa.bj.bcebos.com/corpus/marco.tar.gz