Lynten / stanford-corenlp

Python wrapper for Stanford CoreNLP.
MIT License
920 stars 200 forks source link

Problems with NER #16

Closed TwinkleChow closed 6 years ago

TwinkleChow commented 6 years ago

When I tried to run the demo code for ner: print 'Named Entities:', nlp.ner(sentence) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 146, in ner r_dict = self._request('ner', sentence) File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 171, in _request r_dict = json.loads(r.text) File "/usr/lib/pypy/lib-python/2.7/json/init.py", line 347, in loads return _default_decoder.decode(s) File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 363, in decode obj, end = self.raw_decode(s, idx=WHITESPACE.match(s, 0).end()) File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 381, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

Does anyone know how to solve this problem? I use StanfordCoreNLP version 3.7.0

Lynten commented 6 years ago

When encounter error, add quiet=False and logging_level=logging.DEBUG:

import logging
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)
nlp.close()

And then to see what happend.

The-Gupta commented 6 years ago

@Lynten I've the same error as #17. I'm using CoreNLP 3.9.1. And, the error does not come with CoreNLP 3.7.0. I'm adding quiet and logging_level -

nlp = StanfordCoreNLP(r'D:\SDS\1_MachineLearning\stanford-corenlp-full-2018-02-27\\', 
                      quiet=False, logging_level=logging.DEBUG)

ERROR:

DEBUG:urllib3.connectionpool:http://localhost:9000 "POST /?properties=%7B%27annotators%27%3A+%27coref%27%2C+%27pinelineLanguage%27%3A+%27en%27%7D HTTP/1.1" 500 57
Traceback (most recent call last):

  File "<ipython-input-9-1877aa015047>", line 104, in <module>
    keywords = extract_phrases(df.iloc[:,1].tolist())

  File "<ipython-input-9-1877aa015047>", line 51, in extract_phrases
    result = json.loads(nlp.annotate(doc[i].replace('\n', ' ').replace('\r', ' '), properties= {'annotators': 'coref', 'pinelineLanguage': 'en'}))

  File "D:\Anaconda3\envs\traceability\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)

  File "D:\Anaconda3\envs\traceability\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())

  File "D:\Anaconda3\envs\traceability\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value

Shall I post it on Stanford CoreNLP's GitHub issues as well?

The-Gupta commented 6 years ago

@DANNALI35 Adding 'timeout': '500000' to properties worked for me as nlp.annotate method takes longer to execute with CoreNLP version 3.9.0 and 3.9.1 wrt 3.7.0 [at least for coref and depparse for which I'm using], so it was returning nothing for larger strings (because of time out) and thus, causing the error.

json.loads(nlp.annotate(doc[i].replace('\n', ' ').replace('\r', ' '), properties= { 'timeout': '500000', 'annotators': 'coref', 'pinelineLanguage': 'en'}))

SophiaChen23 commented 6 years ago

@The-Gupta I am new to nlp and still wonder where should i add 'timeout': '500000' , where is the properties? In the /dist-packages/stanfordcorenlp/corenlp.py"? Thanks

The-Gupta commented 6 years ago

@SophiaChen23, nlp.annotate() has properties as a parameter, where you need to specify 'timeout': '500000' as shown in my previous comment.

SophiaChen23 commented 6 years ago

@The-Gupta Sorry my error message is a little different from yours. And there is no solution towards this problems online. Can you help me figure out? File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/stanfordcorenlp/corenlp.py", line 195, in ner r_dict = self._request('ner', sentence) File "/usr/local/lib/python2.7/dist-packages/stanfordcorenlp/corenlp.py", line 239, in _request r_dict = json.loads(r.text) File "/usr/lib/python2.7/json/init.py", line 339, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded Thank you so much

The-Gupta commented 6 years ago

@SophiaChen23, I'm not sure about the problem. It's hard to produce the same error w/o code.

XUEZIJIAN commented 6 years ago

@Lynten I have same problem #32 But I can start the server by ' java --add-modules java.se.ee -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000' I can open the main interface, but it is very slow and no result after I type the text.

java version "9" Java(TM) SE Runtime Environment (build 9+181) Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

here‘s the ERROR:

pydev debugger: starting

pydev debugger: New process is launching (breakpoints won't work in the new process). pydev debugger: To debug that process please enable 'Attach to subprocess automatically while debugging?' option in the debugger settings.

INFO:root:Initializing native server... INFO:root:java -Xmx4g -cp "D:\Stanford CoreNLP\stanford-corenlp-full-2018-02-27*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 INFO:root:Server shell PID: 17432 [main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called --- [main] INFO CoreNLP - setting default constituency parser [main] INFO CoreNLP - warning: cannot find edu/stanford/nlp/models/srparser/englishSR.ser.gz [main] INFO CoreNLP - using: edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz instead [main] INFO CoreNLP - to use shift reduce parser download English models jar from: [main] INFO CoreNLP - http://stanfordnlp.github.io/CoreNLP/download.html [main] INFO CoreNLP - Threads: 8 [main] INFO CoreNLP - Starting server... [main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000 INFO:root:The server is available. INFO:root:{'properties': "{'annotators': 'ssplit,tokenize', 'outputFormat': 'json'}", 'pipelineLanguage': 'en'} DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost [pool-1-thread-2] INFO CoreNLP - [/0:0:0:0:0:0:0:1:59428] API call w/annotators tokenize,ssplit Guangdong University of Foreign Studies is located in Guangzhou. [pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [pool-1-thread-2] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit DEBUG:urllib3.connectionpool:http://localhost:9000 "POST /?properties=%7B%27annotators%27%3A+%27ssplit%2Ctokenize%27%2C+%27outputFormat%27%3A+%27json%27%7D&pipelineLanguage=en HTTP/1.1" 200 1295 Tokenize: ['Guangdong', 'University', 'of', 'Foreign', 'Studies', 'is', 'located', 'in', 'Guangzhou', '.'] INFO:root:{'properties': "{'annotators': 'pos', 'outputFormat': 'json'}", 'pipelineLanguage': 'en'} DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost [pool-1-thread-3] INFO CoreNLP - [/0:0:0:0:0:0:0:1:59429] API call w/annotators tokenize,ssplit,pos Guangdong University of Foreign Studies is located in Guangzhou. [pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [pool-1-thread-3] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos [pool-1-thread-3] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.1 sec]. DEBUG:urllib3.connectionpool:http://localhost:9000 "POST /?properties=%7B%27annotators%27%3A+%27pos%27%2C+%27outputFormat%27%3A+%27json%27%7D&pipelineLanguage=en HTTP/1.1" 200 1411 Part of Speech: [('Guangdong', 'NNP'), ('University', 'NNP'), ('of', 'IN'), ('Foreign', 'NNP'), ('Studies', 'NNPS'), ('is', 'VBZ'), ('located', 'JJ'), ('in', 'IN'), ('Guangzhou', 'NNP'), ('.', '.')] INFO:root:{'properties': "{'annotators': 'ner', 'outputFormat': 'json'}", 'pipelineLanguage': 'en'} DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost [pool-1-thread-4] INFO CoreNLP - [/0:0:0:0:0:0:0:1:59431] API call w/annotators tokenize,ssplit,pos,lemma,ner Guangdong University of Foreign Studies is located in Guangzhou. [pool-1-thread-4] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [pool-1-thread-4] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit [pool-1-thread-4] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos [pool-1-thread-4] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma [pool-1-thread-4] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner [pool-1-thread-4] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.5 sec]. [pool-1-thread-4] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.9 sec]. [pool-1-thread-4] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.8 sec]. [pool-1-thread-4] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. edu.stanford.nlp.util.ReflectionLoading$ReflectionLoadingException: Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:38) at edu.stanford.nlp.time.TimeExpressionExtractorFactory.create(TimeExpressionExtractorFactory.java:60) at edu.stanford.nlp.time.TimeExpressionExtractorFactory.createExtractor(TimeExpressionExtractorFactory.java:43) at edu.stanford.nlp.ie.regexp.NumberSequenceClassifier.(NumberSequenceClassifier.java:86) at edu.stanford.nlp.ie.NERClassifierCombiner.(NERClassifierCombiner.java:135) at edu.stanford.nlp.pipeline.NERCombinerAnnotator.(NERCombinerAnnotator.java:131) at edu.stanford.nlp.pipeline.AnnotatorImplementations.ner(AnnotatorImplementations.java:68) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$getNamedAnnotators$44(StanfordCoreNLP.java:546) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$69(StanfordCoreNLP.java:625) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:201) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:194) at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:181) at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.mkStanfordCoreNLP(StanfordCoreNLPServer.java:366) at edu.stanford.nlp.pipeline.StanfordCoreNLPServer.access$800(StanfordCoreNLPServer.java:50) at edu.stanford.nlp.pipeline.StanfordCoreNLPServer$CoreNLPHandler.handle(StanfordCoreNLPServer.java:851) at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source) at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(Unknown Source) at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source) at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(Unknown Source) at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Unknown Source) at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public edu.stanford.nlp.time.TimeExpressionExtractorImpl(java.lang.String,java.util.Properties) with args [sutime, {}] at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237) at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382) at edu.stanford.nlp.util.ReflectionLoading.loadByReflection(ReflectionLoading.java:36) ... 27 more Caused by: java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233) ... 29 more Caused by: java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException at de.jollyday.util.CalendarUtil.(CalendarUtil.java:42) at de.jollyday.HolidayManager.(HolidayManager.java:66) at de.jollyday.impl.DefaultHolidayManager.(DefaultHolidayManager.java:46) at edu.stanford.nlp.time.JollyDayHolidays$MyXMLManager.(JollyDayHolidays.java:148) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at java.base/java.lang.Class.newInstance(Unknown Source) at de.jollyday.caching.HolidayManagerValueHandler.instantiateManagerImpl(HolidayManagerValueHandler.java:60) at de.jollyday.caching.HolidayManagerValueHandler.createValue(HolidayManagerValueHandler.java:41) at de.jollyday.caching.HolidayManagerValueHandler.createValue(HolidayManagerValueHandler.java:13) at de.jollyday.util.Cache.get(Cache.java:51) at de.jollyday.HolidayManager.createManager(HolidayManager.java:168) at de.jollyday.HolidayManager.getInstance(HolidayManager.java:148) at edu.stanford.nlp.time.JollyDayHolidays.init(JollyDayHolidays.java:57) at edu.stanford.nlp.time.Options.(Options.java:119) at edu.stanford.nlp.time.TimeExpressionExtractorImpl.init(TimeExpressionExtractorImpl.java:44) at edu.stanford.nlp.time.TimeExpressionExtractorImpl.(TimeExpressionExtractorImpl.java:39) ... 34 more Caused by: java.lang.ClassNotFoundException: javax.xml.bind.JAXBException at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source) at java.base/java.lang.ClassLoader.loadClass(Unknown Source) ... 53 more DEBUG:urllib3.connectionpool:http://localhost:9000 "POST /?properties=%7B%27annotators%27%3A+%27ner%27%2C+%27outputFormat%27%3A+%27json%27%7D&pipelineLanguage=en HTTP/1.1" 500 132 Traceback (most recent call last): File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\ptvsd_launcher.py", line 111, in vspd.debug(filename, port_num, debug_id, debug_options, run_as) File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\Packages\ptvsd\debugger.py", line 36, in debug run(address, filename, *args, kwargs) File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\Packages\ptvsd_main.py", line 47, in run_file run(argv, addr, kwargs) File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\Packages\ptvsd_main.py", line 98, in _run _pydevd.main() File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\Packages\ptvsd\pydevd\pydevd.py", line 1628, in main globals = debugger.run(setup['file'], None, None, is_module) File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\Packages\ptvsd\pydevd\pydevd.py", line 1035, in run pydev_imports.execfile(file, globals, locals) # execute the script File "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\Extensions\Microsoft\Python\Core\Packages\ptvsd\pydevd_pydev_imps_pydev_execfile.py", line 25, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "D:\Project\AI2NLP\AI2NLP\AI2NLP.py", line 9, in print ('Named Entities:', nlp.ner(sentence)) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\stanfordcorenlp\corenlp.py", line 195, in ner r_dict = self._request('ner', sentence) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\stanfordcorenlp\corenlp.py", line 239, in _request r_dict = json.loads(r.text) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\json__init__.py", line 354, in loads return _default_decoder.decode(s) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\json\decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\json\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

zhuzixiao commented 6 years ago

This is an issue related with Java version. The root cause is java.lang.ClassNotFoundException: javax.xml.bind.JAXBException. JAXB is no more supported in the se JDK since Java 9. For more information, plz check JAXB exception

So we had to add the --add-modules java.se.ee args when starting the nlp server. If we need to use the nlp wrapper, I suggest to add the following code in the corenlp.py. You can add it right after the line of java_args = "-Xmx{}".format(self.memory)

This is compatible for all the java versions.

# Solve the problem of Java version. The JAXB is no more included in Java SE since Java 9.
# java.lang.ClassNotFoundException: javax.xml.bind.JAXBException
# --add-modules java.se.ee for java version larger than 1.8.0
java_version = subprocess.check_output(['java', '-version'], stderr=subprocess.STDOUT).decode("ASCII")
# java version "x.x.x"
# Java(TM) SE Runtime Environment (build x.x.x_xxx-xx)
# Java HotSpot(TM) 64-Bit Server VM (build xxx, mixed mode)
start = java_version.find('\"') + 1
end = java_version.find('\"',start)
java_version = java_version[start:end]
add_modules = True;
if(java_version.startswith("1.")):
    version_number = int(java_version[2:3])
    if(version_number <= 8):
        add_modules = False;
if(add_modules):
    java_args += " --add-modules java.se.ee"
shankha117 commented 3 years ago

@SophiaChen23 did you get the solution? I am also facing the same issue.