Wordseer / stanford-corenlp-python

Python wrapper for Stanford CoreNLP tools
GNU General Public License v2.0
58 stars 27 forks source link

How do you get named entities? #7

Open dgonzo opened 9 years ago

dgonzo commented 9 years ago

In the default.properties file I see that there are options for applying ner as one of the annotators. But when I try something like

annotators = tokenize, ssplit, pos, lemma, depparse, regexner

I don't get a named entity annotation. I also see that the ner models in corenlp can be selected but the are below a comment:

annotators = tokenize, ssplit, pos, lemma, depparse

# specify Stanford Dependencies format for backwards compatibility
# (new default is Universal Dependencies in 3.5.2)
depparse.model = edu/stanford/nlp/models/parser/nndep/english_SD.gz

# A true-casing annotator is also available (see below)
#annotators = tokenize, ssplit, pos, lemma, truecase

...

#
# None of these paths are necessary anymore: we load all models from the JAR file
#

...

#ner.model.3class = /u/nlp/data/ner/goodClassifiers/all.3class.distsim.crf.ser.gz
#ner.model.7class = /u/nlp/data/ner/goodClassifiers/muc.distsim.crf.ser.gz
#ner.model.MISCclass = /u/nlp/data/ner/goodClassifiers/conll.distsim.crf.ser.gz

How do I return a named entity annotation?

How do I select from the ner models?

mmihaltz commented 9 years ago

Hi,

I'm just another user of stanford-corenlp-python, but perhaps I could answer your qestion.

To use Stanford NER in the CoreNLP pipeline, try adding it to the annotators:

annotators = tokenize, ssplit, pos, lemma, ner, depparse

To specify models, use the ner.model property: "NER model(s) in a comma separated list to use instead of the default models. By default, the models used will be the 3class, 7class, and MISCclass models, in that order." (source)

dgonzo commented 9 years ago

@mmihaltz That's exactly what I tried to no avail including:

annotators = tokenize, ssplit, pos, lemma, depparse, regexner

and

annotators = tokenize, ssplit, pos, lemma, depparse, ner
farhan0581 commented 8 years ago

In my case when I add ner to the default.properties as: annotators = tokenize, ssplit, pos, lemma, depparse , ner It gives me error: Traceback (most recent call last): File "corenlp/corenlp.py", line 515, in nlp = StanfordCoreNLP(options.corenlp, properties=options.properties, serving=True) File "corenlp/corenlp.py", line 347, in init self._spawn_corenlp() File "corenlp/corenlp.py", line 336, in _spawn_corenlp self.corenlp.expect("\nNLP> ") File "/usr/lib/python2.7/dist-packages/pexpect/init.py", line 1418, in expect timeout, searchwindowsize) File "/usr/lib/python2.7/dist-packages/pexpect/init.py", line 1433, in expect_list timeout, searchwindowsize) File "/usr/lib/python2.7/dist-packages/pexpect/init.py", line 1535, in expect_loop raise TIMEOUT(str(err) + '\n' + str(self)) pexpect.TIMEOUT: Timeout exceeded. <pexpect.spawn object at 0x7f1aa637b350> version: 3.1 command: /usr/bin/java args: ['/usr/bin/java', '-Xmx3g', '-cp', '/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/stanford-corenlp-3.6.0-sources.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/stanford-corenlp-3.6.0-models.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/jollyday-0.4.7-sources.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/stanford-corenlp-3.6.0-javadoc.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/slf4j-api.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/javax.json.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/xom.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/jollyday.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/joda-time-2.9-sources.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/javax.json-api-1.0-sources.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/slf4j-simple.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/protobuf.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/ejml-0.23.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/xom-1.2.10-src.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/stanford-corenlp-3.6.0.jar:/home/farhan/Recommendation-system/stanford-corenlp-full-2015-12-09/joda-time.jar', 'edu.stanford.nlp.pipeline.StanfordCoreNLP', '-props', '/home/farhan/Recommendation-system/stanford-corenlp-python/corenlp/default.properties'] searcher: <pexpect.searcher_re object at 0x7f1aa637b3d0> buffer (last 100 chars): 'tor depparse\r\nLoading depparse model file: edu/stanford/nlp/models/parser/nndep/english_SD.gz ... \r\n' before (last 100 chars): 'tor depparse\r\nLoading depparse model file: edu/stanford/nlp/models/parser/nndep/english_SD.gz ... \r\n' after: <class 'pexpect.TIMEOUT'> match: None match_index: None exitstatus: None flag_eof: False pid: 4607 child_fd: 4 closed: False timeout: 30 delimiter: <class 'pexpect.EOF'> logfile: None logfile_read: None logfile_send: None maxread: 8192 ignorecase: False searchwindowsize: 80 delaybeforesend: 0.05 delayafterclose: 0.1 delayafterterminate: 0.1