baidu / DuReader

Baseline Systems of DuReader Dataset
http://ai.baidu.com/broad/subordinate?dataset=dureader
1.13k stars 308 forks source link

what should I do if I want to use my data? #50

Open JYZ122 opened 5 years ago

JYZ122 commented 5 years ago

I want to know what the various keys of the json data set represent. For example, ‘is_selected’, ‘answer_spans’, and ‘match_scores’. And I see that there are no such keys in the raw data.

menghuu commented 5 years ago
  1. make your data look like dureader-raw
  2. segmented them into segmented-xxx
  3. using the preprocess.py script to generate preprocess-data