Closed Apollo2Mars closed 4 years ago
I have the same problem, are you good now?
I have the same problem, are you good now? I still have the problem But I find the download script can download the preprocessed data
Are you chinese?you can add my WeChat if you are chinese.Wechat:JIMWIJ
Pls refer to issue #15 , and or you can split words simply by chars.
What should I do if I want to convert the data set in raw to the data in preprocessed?
zhangyan@ubuntu:~/jyz/DuReader-master$ cat data/preprocessed/trainset/search.train.json | python utils/preprocess.py > data/preprocessed/trainset/search1.train.json
Traceback (most recent call last):
File "utils/preprocess.py", line 218, in
This problem occured
Are you chinese?you can add my WeChat if you are chinese.Wechat:JIMWIJ
OK, I'm Chinese, I add your WeChat
run cat data/raw/trainset/search.zhidao.json | python utils/preprocess.py > data/preprocessed/trainset/zhidao.train.json
may be sucess
这个需要用py2来执行……py3的json是没有encoding参数的,可以在run.sh里面把python后面全都加一个2(如果默认是3的话)
trainset中的内容为什么在训练的时候question有的利用不上,比如我训练集有1000个问题,然后显示只有900个问题可以利用,这是为什么?
这个需要用py2来执行……py3的json是没有encoding参数的,可以在run.sh里面把python后面全都加一个2(如果默认是3的话)
谢谢,我去试验一下
这个需要用py2来执行……py3的json是没有encoding参数的,可以在run.sh里面把python后面全都加一个2(如果默认是3的话)
谢谢,我去试验一下
你好,请问这个报错后来解决了吗?如果解决了是如何解决的呢?
所以这个问题还是没有解决啊
Traceback (most recent call last): File "utils/preprocess.py", line 217, in
find_fake_answer(sample)
File "utils/preprocess.py", line 158, in find_fake_answer
for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
KeyError: 'segmented_paragraphs'
when I run the script in the readme, this error occur, please check