FraBle / python-sutime

Python wrapper for Stanford CoreNLP's SUTime
GNU General Public License v3.0
153 stars 43 forks source link

its soo slow!! #24

Open alabrashJr opened 5 years ago

alabrashJr commented 5 years ago

why the library is running so much slow?,

https://github.com/emerging-welfare/nextflow_test/blob/0884930d98df1f78b56d1a4c60a5e96289ead9f8/bin/temporalTagger.py#L1-L34

CPU times: user 2.55 s, sys: 614 ms, total: 3.17 s
Wall time: 3min 11s

the output as following,  
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Registering annotator sutime with class edu.stanford.nlp.time.TimeAnnotator
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.9 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - sutime.includeRange=false
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.includeRange|
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - sutime.language=english
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - sutime.markTimeRanges=true
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.markTimeRanges|
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.includeRange|
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.markTimeRanges|
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.4 sec].
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.includeRange|
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.markTimeRanges|
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.4 sec].
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.includeRange|
[main] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - Unknown property: |sutime.markTimeRanges|
[main] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.7 sec].
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580704 unique entries out of 581863 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4869 unique entries out of 4869 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
[main] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585573 unique entries from 2 files
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator sutime
[main] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[main] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
feng-1985 commented 5 years ago

I am not very familar with java. In this package, it using the jpype to start jvm. I previous think that the model can be pre-load. But after test (create an sutime api in django), the model can't pre-load in memory, is there any other solution?

BandeepSingh commented 4 years ago

@bifeng Did you find a way to pre-load the model in memory ?