Closed mpandeydev closed 5 years ago
from drqa.tokenizers import CoreNLPTokenizer
tok = CoreNLPTokenizer() # this line was causing the problem
tok.tokenize('hello world').words() # this one was never executed
To fix I had to debug the _launch
method in corenlp_tokenizer.py
using ipdb
and print(self.corenlp.read())
to see that java was not able to load StanfordCoreNLP properly:
buffer (last 100 chars): b'ad main class edu.stanford.nlp.pipeline.StanfordCoreNLP\r\n
...even tho, I was able to run it from my terminal using the command from print(' '.join(cmd))
):
$java -mx2g -cp ":data/corenlp/*:data/corenlp/*:data/corenlp/*" edu.stanford.n
lp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit -tokenize.options untokenizable=noneDelete,invertible=true -ou
tputFormat json -prettyPrint false
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
Entering interactive shell. Type q RETURN or EOF to quit.
NLP> q
At the end I fixed the issue in a custom and ugly way like this:
self.corenlp = pexpect.spawn('/bin/bash', maxread=100000, timeout=60)
self.corenlp.setecho(False)
self.corenlp.sendline('source ~/.bashrc')
self.corenlp.sendline('cd $HOME/doc/DrQA')
self.corenlp.sendline('module load java/jdk/1.8.0_131')
self.corenlp.sendline('stty -icanon')
self.corenlp.sendline(' '.join(cmd))
self.corenlp.delaybeforesend = 0
self.corenlp.delayafterread = 0
self.corenlp.expect_exact('NLP>', searchwindowsize=-1)
Good luck to all!!!
I plan on moving to a more robust tokenizer soon. In the meantime, sorry for the struggle with pexpect.
when i run -> python scripts/reader/interactive.py I get the following:
04/18/2019 04:00:43 PM: [ CUDA enabled (GPU -1) ] 04/18/2019 04:00:43 PM: [ Initializing model... ] 04/18/2019 04:00:43 PM: [ Loading model /home/mpandey/ServiceNet/drqa_dev/DrQA/data/reader/single.mdl ] 04/18/2019 04:00:43 PM: [ Initializing tokenizer... ] Traceback (most recent call last): File "/home/mp/ServiceNet/mindstone_env/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/home/mp/ServiceNet/mindstone_env/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking raise TIMEOUT('Timeout exceeded.') pexpect.exceptions.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "scripts/reader/interactive.py", line 53, in
normalize=not args.no_normalize)
File "/home/mp/ServiceNet/drqa_dev/DrQA/drqa/reader/predic
I followed the instruction as described in the README file. My echo $CLASSPATH /home/mp/ServiceNet/drqa_dev/DrQA/data/corenlp/::data/corenlp/
Can you suggest how to debug this issue?