eXist-db / exist-stanford-nlp

XQuery wrapper around the Stanford CoreNLP pipeline
GNU Lesser General Public License v2.1
13 stars 6 forks source link

Many annotators invoked when only 2 were specified #3

Closed joewiz closed 4 years ago

joewiz commented 4 years ago

Using v0.5.2, I would expect the following code to invoke only the two specified annotators, tokenize and ssplit:

xquery version "3.1";

import module namespace nlp="http://exist-db.org/xquery/stanford-nlp";

nlp:parse(
    "Hello World!",
    map {
        "annotators" : "tokenize, ssplit",
        "tokenize.language" : "en"
    }
)

But judging by the logs, it also invokes pos, lemma, ner, depparse, and coref:

2020-02-04 05:47:08,947 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Searching for resource: StanfordCoreNLP.properties ... found. 
2020-02-04 05:47:08,962 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator tokenize 
2020-02-04 05:47:08,977 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator ssplit 
2020-02-04 05:47:08,982 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator pos 
2020-02-04 05:47:09,910 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.9 sec]. 
2020-02-04 05:47:09,910 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator lemma 
2020-02-04 05:47:09,912 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator ner 
2020-02-04 05:47:09,988 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - encoding=utf-8 
2020-02-04 05:47:13,506 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.4 sec]. 
2020-02-04 05:47:14,189 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec]. 
2020-02-04 05:47:15,543 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.4 sec]. 
2020-02-04 05:47:15,550 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. 
2020-02-04 05:47:15,797 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt 
2020-02-04 05:47:23,126 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns. 
2020-02-04 05:47:23,141 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns. 
2020-02-04 05:47:23,142 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - ner.fine.regexner: Read 585573 unique entries from 2 files 
2020-02-04 05:47:49,838 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator depparse 
2020-02-04 05:47:50,110 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ...  
2020-02-04 05:48:14,095 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - PreComputed 99996, Elapsed Time: 17.01 (s) 
2020-02-04 05:48:14,096 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Initializing dependency parser ... done [24.0 sec]. 
2020-02-04 05:48:14,101 [qtp709874091-44] INFO  (SLF4JHandler.java [print]:88) - Adding annotator coref 

(Even with 4 GB allocated to eXist, 30 minutes has passed since this query was submitted, and it hasn't returned its result, the iMac's 8-core CPU is pegged, and eXist is unresponsive.)

joewiz commented 4 years ago

@lcahlander suggested that I restart eXist after uninstalling an old version (e.g., v0.5.1) to ensure that it is completely uninstalled. Doing this before installing the new version (e.g., 0.5.2) fixed the issue.