run cat data/raw/trainset/search.train.json | python utils/preprocess.py > data/preprocessed/trainset/search.train.json

baidu / DuReader

Baseline Systems of DuReader Dataset

http://ai.baidu.com/broad/subordinate?dataset=dureader

1.13k stars 308 forks source link

run cat data/raw/trainset/search.train.json | python utils/preprocess.py > data/preprocessed/trainset/search.train.json #49

Closed Apollo2Mars closed 4 years ago

Apollo2Mars commented 5 years ago

Traceback (most recent call last): File "utils/preprocess.py", line 217, in find_fake_answer(sample) File "utils/preprocess.py", line 158, in find_fake_answer for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']): KeyError: 'segmented_paragraphs'

when I run the script in the readme, this error occur, please check

JYZ122 commented 5 years ago

I have the same problem, are you good now?

Apollo2Mars commented 5 years ago

I have the same problem, are you good now? I still have the problem But I find the download script can download the preprocessed data

JYZ122 commented 5 years ago

Are you chinese?you can add my WeChat if you are chinese.Wechat:JIMWIJ

lkliukai commented 5 years ago

Pls refer to issue #15 , and or you can split words simply by chars.

JYZ122 commented 5 years ago

What should I do if I want to convert the data set in raw to the data in preprocessed?

JYZ122 commented 5 years ago

zhangyan@ubuntu:~/jyz/DuReader-master$ cat data/preprocessed/trainset/search.train.json | python utils/preprocess.py > data/preprocessed/trainset/search1.train.json Traceback (most recent call last): File "utils/preprocess.py", line 218, in print(json.dumps(sample, encoding='utf8', ensure_ascii=False)) File "/home/haoyu/env/anaconda3/lib/python3.6/json/init.py", line 238, in dumps **kw).encode(obj) TypeError: init() got an unexpected keyword argument 'encoding'

This problem occured

Apollo2Mars commented 5 years ago