Closed RitwikGopi closed 7 years ago
Hi,
This is not how you specify the CLASSPATH in java. You need :
as the separator between jars. Instead of adding each jar explicitly, just do export CLASSPATH=/home/ritwik/rd/DrQA/data/corenlp/*
.
I have done the same only.
Hm. Are you sure?
[afisch/~]$ export CLASSPATH=/home/ritwik/rd/DrQA/data/corenlp/*
[afisch/~]$ echo $CLASSPATH
/home/ritwik/rd/DrQA/data/corenlp/*
ritwik@stagwiki:~$ export CLASSPATH=/home/ritwik/rd/DrQA/data/corenlp/*
ritwik@stagwiki:~$ echo $CLASSPATH
/home/ritwik/rd/DrQA/data/corenlp/ejml-0.23.jar /home/ritwik/rd/DrQA/data/corenlp/javax.json-api-1.0-sources.jar /home/ritwik/rd/DrQA/data/corenlp/javax.json.jar /home/ritwik/rd/DrQA/data/corenlp/joda-time-2.9-sources.jar /home/ritwik/rd/DrQA/data/corenlp/joda-time.jar /home/ritwik/rd/DrQA/data/corenlp/jollyday-0.4.9-sources.jar /home/ritwik/rd/DrQA/data/corenlp/jollyday.jar /home/ritwik/rd/DrQA/data/corenlp/protobuf.jar /home/ritwik/rd/DrQA/data/corenlp/slf4j-api.jar /home/ritwik/rd/DrQA/data/corenlp/slf4j-simple.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-javadoc.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-models.jar /home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-sources.jar /home/ritwik/rd/DrQA/data/corenlp/xom-1.2.10-src.jar /home/ritwik/rd/DrQA/data/corenlp/xom.jar
Weird. What shell are you using? bash? Try escaping: export CLASSPATH="/home/ritwik/rd/DrQA/data/corenlp/*"
I am using bash. Above also gives the same result only
Got the CLASSPATHS corrected. Still the error is occuring. Since those are jar files do I need to setup java for running this?
ritwik@stagwiki:~/rd/DrQA$ for d in ~/rd/DrQA/data/corenlp/*;do export CLASSPATH=$CLASSPATH:$d;done
ritwik@stagwiki:~/rd/DrQA$ echo $CLASSPATH
:/home/ritwik/rd/DrQA/data/corenlp/ejml-0.23.jar:/home/ritwik/rd/DrQA/data/corenlp/javax.json-api-1.0-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/javax.json.jar:/home/ritwik/rd/DrQA/data/corenlp/joda-time-2.9-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/joda-time.jar:/home/ritwik/rd/DrQA/data/corenlp/jollyday-0.4.9-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/jollyday.jar:/home/ritwik/rd/DrQA/data/corenlp/protobuf.jar:/home/ritwik/rd/DrQA/data/corenlp/slf4j-api.jar:/home/ritwik/rd/DrQA/data/corenlp/slf4j-simple.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-javadoc.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-models.jar:/home/ritwik/rd/DrQA/data/corenlp/stanford-corenlp-3.8.0-sources.jar:/home/ritwik/rd/DrQA/data/corenlp/xom-1.2.10-src.jar:/home/ritwik/rd/DrQA/data/corenlp/xom.jar
Yes, you need Java 8
As a more direct test, try to see if you can get java -cp "/home/ritwik/rd/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit
to work.
I didn't have java installed. Installing it made it work in python. Anyway above code threw an error.
ritwik@stagwiki:~/rd/DrQA$ java -cp /home/ritwik/rd/DrQA/data/corenlp/*
Error: Could not find or load main class .home.ritwik.rd.DrQA.data.corenlp.javax.json-api-1.0-sources.jar
Anyway, it is working now. I think it will be a good idea to add java in to the dependencies list.
The classpath is working in my case, java test string too - but the problem is:
09/17/2018 01:52:46 PM: [ Running on CPU only. ] 09/17/2018 01:52:46 PM: [ Initializing model... ] 09/17/2018 01:52:46 PM: [ Loading model /usr/work/DrQA/data/reader/single.mdl ] 09/17/2018 01:53:02 PM: [ Initializing tokenizer... ] Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/pexpect/expect.py", line 99, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/usr/local/lib/python3.5/dist-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking raise TIMEOUT('Timeout exceeded.') pexpect.exceptions.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "scripts/reader/interactive.py", line 53, in
logging in through SSH into docker container
Oy. Which version of CoreNLP and/or pexpect are you using? The NER module of the versions past 2017-06-09
loads several gazetteers which breaks this implementation. If you can't get it to work, I'd recommend using the spacy tokenizer. (This should be replaced with a more robust way of using CoreNLP efficiently.)
@RitwikGopi I have the sample issue with you and follow your steps untill run
java -cp "/home/ritwik/rd/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit
I find I can run in command line:
(dl) testMacBook-Pro:DrQA ke$ java -cp "/Users/test/Documents/QA/DrQA/data/corenlp/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
Entering interactive shell. Type q RETURN or EOF to quit.
NLP> hello world
Sentence #1 (2 tokens):
hello world
[Text=hello CharacterOffsetBegin=0 CharacterOffsetEnd=5]
[Text=world CharacterOffsetBegin=6 CharacterOffsetEnd=11]
NLP> how are you ?
Sentence #1 (4 tokens):
how are you ?
[Text=how CharacterOffsetBegin=0 CharacterOffsetEnd=3]
[Text=are CharacterOffsetBegin=4 CharacterOffsetEnd=7]
[Text=you CharacterOffsetBegin=8 CharacterOffsetEnd=11]
[Text=? CharacterOffsetBegin=12 CharacterOffsetEnd=13]
NLP>
But when I run a python script as @ajfisch mentioned:
from drqa.tokenizers import CoreNLPTokenizer
tok = CoreNLPTokenizer()
tok.tokenize('hello world').words() # Should complete immediately
Everything still the same, namely after a long long while, it occurs mistakes as:
Traceback (most recent call last):
File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/ke/Documents/QA/DrQA/test.py", line 2, in <module>
tok = CoreNLPTokenizer()
File "/Users/ke/Documents/QA/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
self._launch()
File "/Users/ke/Documents/QA/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
return exp.expect_loop(timeout)
File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/Users/ke/miniconda3/envs/dl/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x10087dac8>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'\r\nCaused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP\r\nbash-3.2$ '
before (last 100 chars): b'\r\nCaused by: java.lang.ClassNotFoundException: edu.stanford.nlp.pipeline.StanfordCoreNLP\r\nbash-3.2$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 6773
child_fd: 7
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"
If you want to add java directly from your jupyter notebook (if you are using google colab/cloud notebook)
import os
#importing os to set environment variable
!sudo apt-get update
def install_java():
!sudo apt-get install -y openjdk-8-jdk-headless -qq > /dev/null #install openjdk
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" #set environment variable
!java -version #check java version
install_java()
@ajfisch Can you please guide me how to replace corenlp tokenizer with the spacy in DRQA, i'm having lot of trouble in doing all this work.
WhenI try
CLASSPATH is set properly