Timeout exceeded when running reader interactive demo

mpandeydev commented 5 years ago

when i run -> python scripts/reader/interactive.py I get the following:

04/18/2019 04:00:43 PM: [ CUDA enabled (GPU -1) ] 04/18/2019 04:00:43 PM: [ Initializing model... ] 04/18/2019 04:00:43 PM: [ Loading model /home/mpandey/ServiceNet/drqa_dev/DrQA/data/reader/single.mdl ] 04/18/2019 04:00:43 PM: [ Initializing tokenizer... ] Traceback (most recent call last): File "/home/mp/ServiceNet/mindstone_env/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/home/mp/ServiceNet/mindstone_env/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking raise TIMEOUT('Timeout exceeded.') pexpect.exceptions.TIMEOUT: Timeout exceeded.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "scripts/reader/interactive.py", line 53, in normalize=not args.no_normalize) File "/home/mp/ServiceNet/drqa_dev/DrQA/drqa/reader/predic

I followed the instruction as described in the README file. My echo $CLASSPATH /home/mp/ServiceNet/drqa_dev/DrQA/data/corenlp/::data/corenlp/

Can you suggest how to debug this issue?

Cadene commented 5 years ago

from drqa.tokenizers import CoreNLPTokenizer
tok = CoreNLPTokenizer() # this line was causing the problem
tok.tokenize('hello world').words() # this one was never executed

To fix I had to debug the _launch method in corenlp_tokenizer.py using ipdb and print(self.corenlp.read()) to see that java was not able to load StanfordCoreNLP properly:

buffer (last 100 chars): b'ad main class edu.stanford.nlp.pipeline.StanfordCoreNLP\r\n

...even tho, I was able to run it from my terminal using the command from print(' '.join(cmd))):

$java -mx2g -cp ":data/corenlp/*:data/corenlp/*:data/corenlp/*" edu.stanford.n
lp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -tokenize.options untokenizable=noneDelete,invertible=true -ou
tputFormat json -prettyPrint false
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
Entering interactive shell. Type q RETURN or EOF to quit.
NLP> q

At the end I fixed the issue in a custom and ugly way like this:

self.corenlp = pexpect.spawn('/bin/bash', maxread=100000, timeout=60)
self.corenlp.setecho(False)
self.corenlp.sendline('source ~/.bashrc')
self.corenlp.sendline('cd $HOME/doc/DrQA')
self.corenlp.sendline('module load java/jdk/1.8.0_131')
self.corenlp.sendline('stty -icanon')
self.corenlp.sendline(' '.join(cmd))
self.corenlp.delaybeforesend = 0
self.corenlp.delayafterread = 0
self.corenlp.expect_exact('NLP>', searchwindowsize=-1)

Good luck to all!!!

ajfisch commented 5 years ago

I plan on moving to a more robust tokenizer soon. In the meantime, sorry for the struggle with pexpect.

facebookresearch / DrQA

Timeout exceeded when running reader interactive demo #218