fhamborg / Giveme5W1H

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
Apache License 2.0
510 stars 88 forks source link

Connection aborted #35

Closed cruperman closed 5 years ago

cruperman commented 5 years ago

Describe the bug CoreNLP server was Killed when I try parse some article.

[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-1] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [2.0 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/srparser/englishSR.ser.gz ... done [10.2 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.7 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [48.4 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator depparse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model file: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...
[pool-1-thread-1] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 99996, Elapsed Time: 12.837 (s)
[pool-1-thread-1] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [25.6 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator mention
Killed

Script return error:

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

To Reproduce I ran this code https://github.com/fhamborg/Giveme5W1H/blob/master/Giveme5W1H/examples/extracting/parse_from_newsplease.py

CoreNLP server run in Docker ubuntu 18.04 image. Example from http://localhost:9000 works, but when I try use as a library, it breaks.

Versions (please complete the following information):

fhamborg commented 5 years ago

how did you install corenlp, which version is it, how did you start it?

cruperman commented 5 years ago
FROM ubuntu:18.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update
RUN apt-get install -y --no-install-recommends apt-utils
RUN apt-get install -y python3.7 python3.7-dev python3-pip
RUN apt-get install -y openjdk-8-jdk openjdk-8-jre wget unzip

RUN python3.7 -m pip install pip
RUN python3.7 -m pip install giveme5w1h

RUN giveme5w1h-corenlp install
ENTRYPOINT [ "giveme5w1h-corenlp" ]

Install by command giveme5w1h-corenlp install As I understand it, the version specified in your file will be installed Run server with command giveme5w1h-corenlp

Also I try start CoreNLP server manually from Giveme5W-runtime-resources directory with --timeout 150000. java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 150000 -props edu/stanford/nlp/coref/properties/neural-english.properties This didn't help

fhamborg commented 5 years ago

The more I look into this, it does not look like a Giveme5W1H related issue but I rather suspect that this is caused by the docker image, or resources of the machine. I doubt that the corenlp server is killed by sending a text to it for analysis. So I suggest that you open an issue with the corenlp project.