facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 898 forks source link

`TIMEOUT: Timeout exceeded` error trying `tok = CoreNLPTokenizer()` #23

Closed RitwikGopi closed 7 years ago

RitwikGopi commented 7 years ago

WhenI try

>>> from drqa.tokenizers import CoreNLPTokenizer
>>> tok = CoreNLPTokenizer()
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 99, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/usr/local/lib/python3.5/dist-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
    raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ritwik/rd/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
    self._launch()
  File "/home/ritwik/rd/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
    self.corenlp.expect_exact('NLP>', searchwindowsize=100)
  File "/usr/local/lib/python3.5/dist-packages/pexpect/spawnbase.py", line 390, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 107, in expect_loop
    return self.timeout(e)
  File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 70, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7ff89a70f128>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'@stagwiki: ~/rd/DrQA/data/corenlp\x07\x1b[01;32mritwik@stagwiki\x1b[00m:\x1b[01;34m~/rd/DrQA/data/corenlp\x1b[00m$ '
before (last 100 chars): b'@stagwiki: ~/rd/DrQA/data/corenlp\x07\x1b[01;32mritwik@stagwiki\x1b[00m:\x1b[01;34m~/rd/DrQA/data/corenlp\x1b[00m$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 17048
child_fd: 5
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: "b'NLP>'"

CLASSPATH is set properly

corenlp$ echo $CLASSPATH
/home/ritwik/rd/DrQA/data/corenlp/ejml-0.23.jar /home/ritwik/rd/DrQA/data/corenlp/javax.json-api-1.0-sources.jar /home/ritwik/rd/DrQA/data/corenlp/javax.json.jar /home/ritwik/rd/DrQA/data/corenlp/joda-time-2.9-sources.jar /home/ritwik/rd/DrQA/data/corenlp/joda-time.jar /home/ritwik/rd/DrQA/data/corenlp/jollyday-0.4.9-sources.jar /home/ritwik/rd/DrQA/data/corenlp/jollyday.jar /home/ritwik/rd/DrQA/data/corenlp/protobuf.jar /home/ritwik/rd/DrQA/data/corenlp/slf4j-api.jar /home/ritwik/rd/DrQA/data/corenlp/slf4j-simple.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-javadoc.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-models.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-sources.jar /home/ritwik/rd/DrQA/data/corenlp/xom-1.2.10-src.jar /home/ritwik/rd/DrQA/data/corenlp/xom.jar
ajfisch commented 7 years ago

Hi,

This is not how you specify the CLASSPATH in java. You need : as the separator between jars. Instead of adding each jar explicitly, just do export CLASSPATH=/home/ritwik/rd/DrQA/data/corenlp/*.

RitwikGopi commented 7 years ago

I have done the same only.

ajfisch commented 7 years ago

Hm. Are you sure?

[afisch/~]$ export CLASSPATH=/home/ritwik/rd/DrQA/data/corenlp/*
[afisch/~]$ echo $CLASSPATH
/home/ritwik/rd/DrQA/data/corenlp/*
RitwikGopi commented 7 years ago
ritwik@stagwiki:~$ export CLASSPATH=/home/ritwik/rd/DrQA/data/corenlp/*
ritwik@stagwiki:~$ echo $CLASSPATH
/home/ritwik/rd/DrQA/data/corenlp/ejml-0.23.jar /home/ritwik/rd/DrQA/data/corenlp/javax.json-api-1.0-sources.jar /home/ritwik/rd/DrQA/data/corenlp/javax.json.jar /home/ritwik/rd/DrQA/data/corenlp/joda-time-2.9-sources.jar /home/ritwik/rd/DrQA/data/corenlp/joda-time.jar /home/ritwik/rd/DrQA/data/corenlp/jollyday-0.4.9-sources.jar /home/ritwik/rd/DrQA/data/corenlp/jollyday.jar /home/ritwik/rd/DrQA/data/corenlp/protobuf.jar /home/ritwik/rd/DrQA/data/corenlp/slf4j-api.jar /home/ritwik/rd/DrQA/data/corenlp/slf4j-simple.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-javadoc.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-models.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-sources.jar /home/ritwik/rd/DrQA/data/corenlp/xom-1.2.10-src.jar /home/ritwik/rd/DrQA/data/corenlp/xom.jar
ajfisch commented 7 years ago

Weird. What shell are you using? bash? Try escaping: export CLASSPATH="/home/ritwik/rd/DrQA/data/corenlp/*"

RitwikGopi commented 7 years ago

I am using bash. Above also gives the same result only

RitwikGopi commented 7 years ago

Got the CLASSPATHS corrected. Still the error is occuring. Since those are jar files do I need to setup java for running this?

ritwik@stagwiki:~/rd/DrQA$ for d in ~/rd/DrQA/data/corenlp/*;do export CLASSPATH=$CLASSPATH:$d;done
ritwik@stagwiki:~/rd/DrQA$ echo $CLASSPATH
:/home/ritwik/rd/DrQA/data/corenlp/ejml-0.23.jar:/home/ritwik/rd/DrQA/data/corenlp/javax.json-api-1.0-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/javax.json.jar:/home/ritwik/rd/DrQA/data/corenlp/joda-time-2.9-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/joda-time.jar:/home/ritwik/rd/DrQA/data/corenlp/jollyday-0.4.9-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/jollyday.jar:/home/ritwik/rd/DrQA/data/corenlp/protobuf.jar:/home/ritwik/rd/DrQA/data/corenlp/slf4j-api.jar:/home/ritwik/rd/DrQA/data/corenlp/slf4j-simple.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-javadoc.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-models.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/xom-1.2.10-src.jar:/home/ritwik/rd/DrQA/data/corenlp/xom.jar
ajfisch commented 7 years ago

Yes, you need Java 8

ajfisch commented 7 years ago

As a more direct test, try to see if you can get java -cp "/home/ritwik/rd/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit to work.

RitwikGopi commented 7 years ago

I didn't have java installed. Installing it made it work in python. Anyway above code threw an error.

ritwik@stagwiki:~/rd/DrQA$ java -cp /home/ritwik/rd/DrQA/data/corenlp/*
Error: Could not find or load main class .home.ritwik.rd.DrQA.data.corenlp.javax.json-api-1.0-sources.jar

Anyway, it is working now. I think it will be a good idea to add java in to the dependencies list.

netsafe commented 6 years ago

The classpath is working in my case, java test string too - but the problem is:

09/17/2018 01:52:46 PM: [ Running on CPU only. ] 09/17/2018 01:52:46 PM: [ Initializing model... ] 09/17/2018 01:52:46 PM: [ Loading model /usr/work/DrQA/data/reader/single.mdl ] 09/17/2018 01:53:02 PM: [ Initializing tokenizer... ] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 99, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/usr/local/lib/python3.5/dist-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking raise TIMEOUT('Timeout exceeded.') pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "scripts/reader/interactive.py", line 53, in normalize=not args.no_normalize) File "/usr/work/DrQA/drqa/reader/predictor.py", line 84, in init self.tokenizer = tokenizer_class(annotators=annotators) File "/usr/work/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in init self._launch() File "/usr/work/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch self.corenlp.expect_exact('NLP>', searchwindowsize=100) File "/usr/local/lib/python3.5/dist-packages/pexpect/spawnbase.py", line 390, in expect_exact return exp.expect_loop(timeout) File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 107, in expect_loop return self.timeout(e) File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 70, in timeout raise TIMEOUT(msg) pexpect.exceptions.TIMEOUT: Timeout exceeded. <pexpect.pty_spawn.spawn object at 0xb38a4850> command: /bin/bash args: ['/bin/bash'] buffer (last 100 chars): b'sifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.2 sec].\r\n' before (last 100 chars): b'sifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.2 sec].\r\n' after: <class 'pexpect.exceptions.TIMEOUT'> match: None match_index: None exitstatus: None flag_eof: False pid: 29943 child_fd: 5 closed: False timeout: 60 delimiter: <class 'pexpect.exceptions.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 100000 ignorecase: False searchwindowsize: None delaybeforesend: 0 delayafterclose: 0.1 delayafterterminate: 0.1 searcher: searcher_string: 0: "b'NLP>'"

logging in through SSH into docker container

ajfisch commented 6 years ago

Oy. Which version of CoreNLP and/or pexpect are you using? The NER module of the versions past 2017-06-09 loads several gazetteers which breaks this implementation. If you can't get it to work, I'd recommend using the spacy tokenizer. (This should be replaced with a more robust way of using CoreNLP efficiently.)

mazzzystar commented 5 years ago

@RitwikGopi I have the sample issue with you and follow your steps untill run

java -cp "/home/ritwik/rd/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit

I find I can run in command line:

(dl) testMacBook-Pro:DrQA ke$ java -cp "/Users/test/Documents/QA/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit

Entering interactive shell. Type q RETURN or EOF to quit.
NLP> hello world
Sentence #1 (2 tokens):
hello world
[Text=hello CharacterOffsetBegin=0 CharacterOffsetEnd=5]
[Text=world CharacterOffsetBegin=6 CharacterOffsetEnd=11]
NLP> how are you ?
Sentence #1 (4 tokens):
how are you ?
[Text=how CharacterOffsetBegin=0 CharacterOffsetEnd=3]
[Text=are CharacterOffsetBegin=4 CharacterOffsetEnd=7]
[Text=you CharacterOffsetBegin=8 CharacterOffsetEnd=11]
[Text=? CharacterOffsetBegin=12 CharacterOffsetEnd=13]
NLP> 

But when I run a python script as @ajfisch mentioned:

from drqa.tokenizers import CoreNLPTokenizer
tok = CoreNLPTokenizer()
tok.tokenize('hello world').words()  # Should complete immediately

Everything still the same, namely after a long long while, it occurs mistakes as:

Traceback (most recent call last):
  File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
    raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ke/Documents/QA/DrQA/test.py", line 2, in <module>
    tok = CoreNLPTokenizer()
  File "/Users/ke/Documents/QA/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
    self._launch()
  File "/Users/ke/Documents/QA/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
    self.corenlp.expect_exact('NLP>', searchwindowsize=100)
  File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
    return exp.expect_loop(timeout)
  File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
    return self.timeout(e)
  File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
    raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x10087dac8>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'\r\nCaused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP\r\nbash-3.2$ '
before (last 100 chars): b'\r\nCaused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP\r\nbash-3.2$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 6773
child_fd: 7
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
    0: "b'NLP>'"
nikhileshp commented 4 years ago

If you want to add java directly from your jupyter notebook (if you are using google colab/cloud notebook)


import os 
#importing os to set environment variable
!sudo apt-get update
def install_java():
  !sudo apt-get install -y openjdk-8-jdk-headless -qq > /dev/null      #install openjdk
  os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"     #set environment variable
  !java -version       #check java version
install_java()
hinaayousaf commented 3 years ago

@ajfisch Can you please guide me how to replace corenlp tokenizer with the spacy in DRQA, i'm having lot of trouble in doing all this work.