Closed SeekPoint closed 6 years ago
Please use PREPROCESSED version dataset for the raw version does not contain 'segmented_paragraphs' field. Or you can segment Chinese words by yourself or just use char.
@lkliukai The preprocessed dir does not contain this file.
~/github/rasa_opensource/rasa_chinese/DuReader$ cat data/raw/trainset/se arch.train.json | python utils/preprocess.py > data/preprocessed/trainset/search.train.json
Traceback (most recent call last):
File "utils/preprocess.py", line 217, in
mldl@mldlUB1604:~/ub16_prj/DuReader$ cat data/raw/trainset/search.train.json | python3 utils/preprocess.py > data/preprocessed/trainset/search.train.json Traceback (most recent call last): File "utils/preprocess.py", line 217, in
find_fake_answer(sample)
File "utils/preprocess.py", line 158, in find_fake_answer
for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
KeyError: 'segmented_paragraphs'
mldl@mldlUB1604:~/ub16_prj/DuReader$