THUIR / T2Ranking

T2Ranking: A large-scale Chinese benchmark for passage ranking.
142 stars 9 forks source link



T2Ranking is a large-scale Chinese benchmark for passage ranking. The details about T2Ranking are elaborated in this paper.

Passage ranking are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). The goal of passage ranking is to compile a search result list ordered in terms of relevance to the query from a large passage collection. Typically, Passage ranking involves two stages: passage retrieval and passage re-ranking.

To support the passage ranking research, various benchmark datasets are constructed. However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues.

To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real- world search engines. Specifically, we sample question-based search queries from user logs of the Sogou search engine, a popular search system in China. For each query, we extract the content of corresponding documents from different search engines. After model-based passage segmentation and clustering-based passage de-duplication, a large-scale passage corpus is obtained. For a given query and its corresponding passages, we hire expert annotators to provide 4-level relevance judgments of each query-passage pair.

Table 1: The data statistics of datasets commonly used in passage ranking. FR(SR): First (Second)- stage of passage ranking, i.e., passage Retrieval (Re-ranking).

Compared with existing datasets, T2Ranking dataset has the following characteristics and advantages:

Data Download

The whole dataset is placed in huggingface, and the data formats are presented in the following table.

| Description| Filename|Num Records|Format| |-------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|----------:|-----------------------------------:| | Collection | collection.tsv | 2,303,643 | tsv: pid, passage | | Queries Train | queries.train.tsv | 258,042 | tsv: qid, query | | Queries Dev | | 24,832 | tsv: qid, query | | Queries Test | queries.test.tsv | 24,832 | tsv: qid, query | | Qrels Train for re-ranking | qrels.train.tsv | 1,613,421 | TREC qrels format | | Qrels Dev for re-ranking | | 400,536 | TREC qrels format | | Qrels Retrieval Train | qrels.retrieval.train.tsv | 744,663 | tsv: qid, pid | | Qrels Retrieval Dev | | 118,933 | tsv: qid, pid | | BM25 Negatives | train.bm25.tsv | 200,359,731 | tsv: qid, pid, index | | Hard Negatives | train.mined.tsv | 200,376,001 | tsv: qid, pid, index, score |

You can download the dataset by running the following command:

git lfs install
git clone

After downloading, you can find the following files in the folder:

├── data
│   ├── collection.tsv
│   ├──
│   ├──
│   ├── qrels.retrieval.train.tsv
│   ├── qrels.train.tsv
│   ├──
│   ├── queries.test.tsv
│   ├── queries.train.tsv
│   ├── train.bm25.tsv
│   └── train.mined.tsv
├── script
│   ├──
│   └──
└── src

Training and Evaluation

The dual-encoder can be trained by running the following command:

sh script/

After training the model, you can evaluate the model by running the following command:

python src/ data/ output/res.top1000.step20

The cross-encoder can be trained by running the following command:

sh script/

After training the model, you can evaluate the model by running the following command:

python src/ output/res.step-20 && python src/ data/ output/res.step-20.trec && path_to/trec_eval -m ndcg_cut.5 data/ res.step-20.trec

We have uploaded some checkpoints to Huggingface Hub.

Model Description Link
dual-encoder 1 dual-encoder trained with bm25 negatives DE1
dual-encoder 2 dual-encoder trained with self-mined hard negatives DE2
cross-encoder cross-encoder trained with self-mined hard negatives CE

BM25 on DEV set

MRR @10: 0.35894801237316354
QueriesRanked: 24831
recall@1: 0.05098711868967141
recall@1000: 0.7464097131133757
recall@50: 0.4942572226146033

DPR trained with BM25 negatives on DEV set

MRR @10: 0.4856112079562753
QueriesRanked: 24831
recall@1: 0.07367235058688999
recall@1000: 0.9082753169878586
recall@50: 0.7099350889583964

DPR trained with self-mined hard negatives on DEV set

MRR @10: 0.5166915171959451
QueriesRanked: 24831
recall@1: 0.08047455688965123
recall@1000: 0.9135220125786163
recall@50: 0.7327044025157232

BM25 retrieved+CE reranked on DEV set

The reranked run file is placed in here.

MRR @10: 0.5188107959009376
QueriesRanked: 24831
recall@1: 0.08545219116806242
recall@1000: 0.7464097131133757
recall@50: 0.595298153566744
ndcg_cut_20             all     0.4405
ndcg_cut_100            all     0.4705

DPR retrieved+CE reranked on DEV set

The reranked run file is placed in here.

MRR @10: 0.5508822816845231
QueriesRanked: 24831
recall@1: 0.08903406988867588
recall@1000: 0.9135220125786163
recall@50: 0.7393720781623112
ndcg_cut_20             all     0.5131
ndcg_cut_100            all     0.5564


The dataset is licensed under the Apache License 2.0.


If you use this dataset in your research, please cite our paper:

      title={T2Ranking: A large-scale Chinese Benchmark for Passage Ranking}, 
      author={Xiaohui Xie and Qian Dong and Bingning Wang and Feiyang Lv and Ting Yao and Weinan Gan and Zhijing Wu and Xiangsheng Li and Haitao Li and Yiqun Liu and Jin Ma},