facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 899 forks source link

pexpect.exceptions.TIMEOUT: Timeout. #81

Open xingzhoupy opened 6 years ago

xingzhoupy commented 6 years ago

Hi, I uploaded this file and successfully generated the TFIDF file, and now I want to use formatA called in my QA data format: python generate.py / path / to / dataset / dir dataset / path / to / output / dir The following error occurred:

Another exception occurred during the handling of the above exception:

Backtracking (last call last): File "/home/innovate/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, at _bootstrap self.run () File "/home/innovate/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, running self._target (* self._args, * self._kwargs) File "/home/innovate/anaconda3/lib/python3.6/multiprocessing/pool.py", line 103, at worker Initial value setting item ( initargs) Initialize the file "generate.py", line 48 PROCESS_TOK = tokenizer_class (** tokenizer_opts) File "/tmp/yuyide/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, at init self._launch () In the _launch file "/tmp/yuyide/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61 self.corenlp.expect_exact ('NLP>', searchwindowsize = 100) File "/home/innovate/anaconda3/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, expect_exact Return exp.expect_loop (timeout) File "/home/innovate/anaconda3/lib/python3.6/site-packages/pexpect/expect.py", line 107, in the expect_loop Return self.timeout (e) File "/home/innovate/anaconda3/lib/python3.6/site-packages/pexpect/expect.py", line 70, Timeout Increase TIMEOUT (MSG) pexpect.exceptions.TIMEOUT: Timeout. The at 0x7f80a43510f0 Command: / bin / bash args: ['/ bin / bash'] (Last 100 characters): Innovation @ xiaoi-gy-93: / tmp / yuyide / DrQA / scripts / distant \ x07 [Innovation @ xiaoi-gy-93] ' Before (last 100 characters): Innovation @ xiaoi-gy-93: / tmp / yuyide / DrQA / scripts / distant \ x07 [Innovative @ xiaoi-gy-93 Far] $ ' After: <class'pexpect.exceptions.TIMEOUT '> Matches: None match_index: None exitstatus: none flag_eof: False pid: 13589 child_fd: 39 Close: wrong Timeout: 60 Delimiter: <class'pexpect.exceptions.EOF '> Log file: None logfile_read: none logfile_send: none maxread: 100000 ignorecase: wrong searchwindowsize: none delaybeforesend: 0 After the delay is closed: 0.1 delayafterterminate: 0.1 Searcher: searcher_string: 0: "b'NLP> '" Process ForkPoolWorker-32: Backtracking (last call last): File "/home/innovate/anaconda3/lib/python3.6/site-packages/pexpect/expect.py", line 99, in the expect_loop Incoming = spawn.read_nonblocking (spawn.maxread, timeout) File "/home/innovate/anaconda3/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, read_nonblocking Increase TIMEOUT ('Timeout Timeout') pexpect.exceptions.TIMEOUT: Timeout. What is the problem, how can I solve it? Thanks!

ajfisch commented 6 years ago

Have you verified that the CoreNLPTokenizer is properly setup and works independently?

xingzhoupy commented 6 years ago

yeah, I try: from drqa.tokenizers import CoreNLPTokenizer tok = CoreNLPTokenizer() error is same last time,

ajfisch commented 6 years ago

Sounds like its a problem with your setup then -- have you followed the instructions for setting up CoreNLP?

ajfisch commented 6 years ago

Also see related discussion in #61 and #42

xingzhoupy commented 6 years ago

yeah, i setup this , use English is great , but I want to changg language,i download corenlp chinese and move corenlp

ajfisch commented 6 years ago

Hm. The tokenizers were only developed/tested to work with English, so you might have to do some digging. I'd start with directly working with the CoreNLP command line interface (java) and seeing if the errors are thrown on that end. The tokenizer in DrQA is just wrapping that command line with pexpect -- unfortunately if something crashes on the java side pexpect will timeout.

xingzhoupy commented 6 years ago

oh, Thanks ,i try it again.

xingzhoupy commented 6 years ago

hi, i solve this problem ,is my corenlp classpath error , I revised normal. but i run : python3 generate.py /tmp/yuyide/DrQA/data/formatA/ qa.txt /tmp/yuyide/DrQA/data/formatA/ error: 01/15/2018 05:01:15 PM: [ Processing 36181 question answer pairs... ] 01/15/2018 05:01:15 PM: [ Will save to /tmp/yuyide/DrQA/data/formatA/qa.dstrain and /tmp/yuyide/DrQA/data/formatA/qa.dsdev ] 01/15/2018 05:01:15 PM: [ Loading /tmp/yuyide/DrQA/data/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 01/15/2018 05:01:16 PM: [ Ranking documents (top 5 per question)... ] 01/15/2018 05:01:56 PM: [ Pre-tokenizing questions... ] 01/15/2018 05:02:42 PM: [ Searching documents... ] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/innovate/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "generate.py", line 169, in search_docs for j, paragraph in enumerate(re.split(r'\n+', fetch_text(doc_id))): File "/home/innovate/anaconda3/lib/python3.6/site-packages/regex-2017.12.12-py3.6-linux-x86_64.egg/regex.py", line 319, in split return _compile(pattern, flags, kwargs).split(string, maxsplit, concurrent) TypeError: expected string or buffer """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "generate.py", line 312, in process(questions, answers, outfile, opts) File "generate.py", line 214, in process for res in workers.imap_unordered(search_fn, inputs): File "/home/innovate/anaconda3/lib/python3.6/multiprocessing/pool.py", line 735, in next raise value TypeError: expected string or buffer

And i successfully create sample.db and tfidf . so, What is the problem, how can I solve it? thanks

ajfisch commented 6 years ago

You're doing Chinese, correct? Again, I'm not familiar with the extent of the incompatibilities. The regex is failing, and t looks like you might have one of the following problems:

ajfisch commented 6 years ago

See related issue in #77

xingzhoupy commented 6 years ago

hi, Corenlp I used to generate a new tfidf, but do not know why, the implementation of the implementation of the phenomenon of stalling DS, there is no error [innovate @ xiaoi-gy-93 distant] $ python3 generate.py / tmp / yuyide / DrQA / data / formatA / qa1.txt / tmp / yuyide / DrQA / data / formatA / 01/17/2018 10:15:00 AM: [Processing 36181 question answer pairs ...] 01/17/2018 10:15:00 AM: [Will save to /tmp/yuyide/DrQA/data/formatA/qa1.dstrain and /tmp/yuyide/DrQA/data/formatA/qa1.dsdev] 01/17/2018 10:15:00 AM: [Loading /tmp/yuyide/DrQA/data/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz] thanks

hoogang commented 6 years ago

I try: from drqa.tokenizers import CoreNLPTokenizer tok = CoreNLPTokenizer()

run this script

“ python scripts/reader/preprocess.py data/datasets data/datasets --split SQuAD-v1.1-train --tokenizer corenlp ”

is all ok

but I test CoreNLPTokenizer in Chinese word segmentation。

>>> from drqa.tokenizers import CoreNLPTokenizer
>>> tok = CoreNLPTokenizer()
[init tokenizer done]
>>> tok.tokenize('hello world 湖北省武汉市公共交通系统').words()
['hello', 'world', '湖北省', '武汉市', '公共', '交通', '系统']

cmd is OK see as as follows

hugang@server-white:~$ java   -mx3g  -cp    "/home/hugang/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP    -annotators tokenize,ssplit,pos,lemma,ner -props StanfordCoreNLP-chinese.properties
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Loading Chinese dictionaries from 1 file:
[main] INFO edu.stanford.nlp.wordseg.ChineseDictionary -   edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz
[main] INFO edu.stanford.nlp.wordseg.ChineseDictionary - Done. Unique words in ChineseDictionary is: 423200.
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/segmenter/chinese/ctb.gz ... done [12.3 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger ... done [2.9 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz ... done [3.8 sec].

Entering interactive shell. Type q RETURN or EOF to quit.

NLP> 湖北省武安市 今天天气很不错 可以出去郊游
Sentence #1 (9 tokens):
湖北省武安市 今天天气很不错 可以出去郊游
[Text=湖北省 CharacterOffsetBegin=0 CharacterOffsetEnd=3 PartOfSpeech=NR Lemma=湖北省 NamedEntityTag=GPE]
[Text=武安市 CharacterOffsetBegin=3 CharacterOffsetEnd=6 PartOfSpeech=NR Lemma=武安市 NamedEntityTag=GPE]
[Text=今天 CharacterOffsetBegin=7 CharacterOffsetEnd=9 PartOfSpeech=NT Lemma=今天 NamedEntityTag=DATE NormalizedNamedEntityTag=XXXX-XX-XX]
[Text=天气 CharacterOffsetBegin=9 CharacterOffsetEnd=11 PartOfSpeech=NN Lemma=天气 NamedEntityTag=O]
[Text=很 CharacterOffsetBegin=11 CharacterOffsetEnd=12 PartOfSpeech=AD Lemma=很 NamedEntityTag=O]
[Text=不错 CharacterOffsetBegin=12 CharacterOffsetEnd=14 PartOfSpeech=VA Lemma=不错 NamedEntityTag=O]
[Text=可以 CharacterOffsetBegin=15 CharacterOffsetEnd=17 PartOfSpeech=VV Lemma=可以 NamedEntityTag=O]
[Text=出去 CharacterOffsetBegin=17 CharacterOffsetEnd=19 PartOfSpeech=VV Lemma=出去 NamedEntityTag=O]
[Text=郊游 CharacterOffsetBegin=19 CharacterOffsetEnd=21 PartOfSpeech=VV Lemma=郊游 NamedEntityTag=O]

but when I run this script

“python scripts/reader/preprocess.py data/datasets data/datasets --split webqa-test --tokenizer corenlp”

  "webqa-test"    is test set for Chinese reading comprehension

Traceback (most recent call last):
  File "/home/hugang/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/home/hugang/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/hugang/.pyenv/versions/3.6.0/lib/python3.6/multiprocessing/pool.py", line 103, in worker
    initializer(*initargs)
  File "scripts/reader/preprocess.py", line 29, in init
    TOK = tokenizer_class(**options)
  File "/home/hugang/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 37, in __init__
    self._launch()
  File "/home/hugang/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 68, in _launch
    self.corenlp.expect_exact('NLP>', searchwindowsize=100)
  File "/home/hugang/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
    return exp.expect_loop(timeout)
  File "/home/hugang/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
    return self.timeout(e)
  File "/home/hugang/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f19c6d072b0>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'er-white:~/DrQA$ [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\r\n'
before (last 100 chars): b'er-white:~/DrQA$ [main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize\r\n'
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 15458
child_fd: 21
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: "b'NLP>'"

error is same last time, I try lots of methods and still can't solve this problem, which makes me confused, please help me.

mazzzystar commented 5 years ago

@hoogang The same problem with you. Have you solved it ?

niimi1996 commented 5 years ago

What to do in order to generate long or lengthy answers

hoogang commented 5 years ago

@hoogang The same problem with you. Have you solved it ? You can find the solution here @URL https://github.com/AmoseKang/DrQA_cn/issues/5

leomrocha commented 4 years ago

Same issue here but the DrQA setup from golden retriever

shreyas-badiger commented 4 years ago

Even I faced the timeout issue. However, it was because I hadn't set the correct CLASSPATH.