Closed yclzju closed 2 years ago
We are working on cleaning a better version of the data, and will release the data soon. Stay tuned!
Thanks.
Great! Will you release your code of preprocessing data including msmarco, dureader first?
@yclzju We have released DuReader_retrieval, a large-scale Chinese benchmark for passage retrieval. The dataset contains over 90K questions and 8M passages from Baidu Search.
paper: https://arxiv.org/abs/2203.10232 data: https://github.com/baidu/DuReader/tree/master/DuReader-Retrieval baseline: https://github.com/PaddlePaddle/RocketQA/tree/main/research/DuReader-Retrieval-Baseline
Hi,Great job, I find that you release Chinese retrieval model trained on Dureader, Could you please also release your preprocess code or processed datasets.