Open zsp1993 opened 5 years ago
是分词器哪里没设置对吗,好久了没弄明白
可能是我的百度爬虫有问题,编码看起来也有问题。分词器测试请直接把命令print出来,先测试命令行上能否运行。
你说的命令行运行print是这个吗? from drqa.tokenizers import CoreNLPTokenizer tok = CoreNLPTokenizer() tok.tokenize('hello world').words()
Zh_tokenizer.py
cmd = ['java', '-mx' + self.mem, '-cp', '\'%s\'' % self.classpath,
'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props',
'StanfordCoreNLP-chinese.properties',
'-annotators', annotators, '-tokenize.options', options,
'-outputFormat', 'json', '-prettyPrint', 'false']
# print(cmd)
这里面还要改一行路径
您好,这个路径我已经改了(别的没改) self.classpath = '/Users/zhangshaopeng/pyproject/github/DrQA_cn/data/corenlp/*' 测试的话是要运行这个文件吗?
这个是取消print的注释后输出的内容 ['java', '-mx2g', '-cp', "'/home/zhangshaopeng/github/DrQA_cn/data/corenlp/*'", 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props', 'StanfordCoreNLP-chinese.properties', '-annotators', 'tokenize,ssplit,pos,lemma,ner', '-tokenize.options', 'untokenizable=noneDelete,invertible=true', '-outputFormat', 'json', '-prettyPrint', 'false'] 直接粘贴到命令行后提示 bash: [java,: 未找到命令
去掉标点和括号执行后报错如下: java -mx2g -cp /home/zhangshaopeng/github/DrQA_cn/data/corenlp/* edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-chinese.properties -annotators tokenize ssplit pos lemma ner -tokenize.options untokenizable=noneDelete invertible=true -outputFormat json -prettyPrint false 错误: 找不到或无法加载主类 .home.zhangshaopeng.github.DrQA_cn.data.corenlp.javax.json-api-1.0-sources.jar
这个文件夹/home/zhangshaopeng/github/DrQA_cn/data/corenlp/中的内容如下: ejml-0.23.jar jollyday-0.4.9-sources.jar stanford-chinese-corenlp-2017-06-09-models.jar xom-1.2.10-src.jar javax.json-api-1.0-sources.jar jollyday.jar stanford-corenlp-3.8.0.jar xom.jar javax.json.jar protobuf.jar stanford-corenlp-3.8.0-javadoc.jar joda-time-2.9-sources.jar slf4j-api.jar stanford-corenlp-3.8.0-models.jar joda-time.jar slf4j-simple.jar stanford-corenlp-3.8.0-sources.jar
你好,我下载了中文jar包放到了facebook开源的DrQA tokenizer对应文件夹下面,把整个文件夹拷到了您开源的DrQA_cn对应目录,处理时报错 process('江泽明是谁?', doc_n=1, pred_n=1, net_n=1) 01/10/2019 10:51:16 AM: [ [question after filting : 江泽明是谁? ] ] 01/10/2019 10:51:17 AM: [ [retreive from net : 1 | expect : 1] ] =================raw text================== �״��������� ���ɻ��3���齱���ᣬ100%�н��� ��ѡ��1����������Ĺؼ��ʣ�����������ϡ�Ҳ��ֱ�ӵ㡰�������ϡ������������⡣ ���ǻ�ͨ����Ϣ������ȷ�ʽ���콫�ٱ����֪ͨ���� �����ʺ�״̬���� ��л�������ǵ�֧�� ��ICP֤030173��-1�����ġ�2013��0934-983��©2019Baiduʹ�ðوٴ�ǰ�ض�|֪��Э��|�وٴ�֪��Ʒ�ƺ���
Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 99, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/usr/local/lib/python3.5/dist-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking raise TIMEOUT('Timeout exceeded.') pexpect.exceptions.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "scripts/pipeline/sinteractive.py", line 67, in process
docTopN=doc_n, netTopN=net_n)
File "/home/zhangshaopeng/github/DrQA_cn/drqa/pipeline/simpleDrQA.py", line 50, in predict
ans.extend(process(text))
File "/home/zhangshaopeng/github/DrQA_cn/drqa/pipeline/simpleDrQA.py", line 33, in process
line, query, candidates=None, top_n=qasTopN)
File "/home/zhangshaopeng/github/DrQA_cn/drqa/reader/predictor.py", line 86, in predict
results = self.predict_batch([(document, question, candidates,)], top_n)
File "/home/zhangshaopeng/github/DrQA_cn/drqa/reader/predictor.py", line 105, in predict_batch
q_tokens = list(map(self.tokenizer.tokenize, questions))
File "/home/zhangshaopeng/github/DrQA_cn/drqa/tokenizers/Zh_tokenizer.py", line 105, in tokenize
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/usr/local/lib/python3.5/dist-packages/pexpect/spawnbase.py", line 390, in expect_exact
return exp.expect_loop(timeout)
File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7feb66984128>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'-437: /home/zhangshaopeng/github/DrQA_cn\x07root@kml-dtmachine-437:/home/zhangshaopeng/github/DrQA_cn#'
before (last 100 chars): b'-437: /home/zhangshaopeng/github/DrQA_cn\x07root@kml-dtmachine-437:/home/zhangshaopeng/github/DrQA_cn#'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 15919
child_fd: 12
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"