emorynlp / nlp4j-old

NLP tools developed by Emory University.
Other
60 stars 19 forks source link

POS tagger model takes time to load #33

Open damzC opened 8 years ago

damzC commented 8 years ago

Hi, I was trying to use your POS Tagger using NLPDecodeRaw class. I am developing an app in python where I need to POS tag one sentence at a time. So, every time I call the java class it loads the model and it takes around 10 seconds, which is a lot in a real time scenario. I tried Serializing the decoder object, to use one loaded copy of the model, but the NLPDecodeRaw class is not serializable. Can you please suggest some way to get POS tag on a sentence by sentence basis (not a file), without loading the model every time, or if there is any other way out to reduce the turnaround time.

benson-basis commented 8 years ago

You create a single decoder object and use it over and over again. To do this from python, you need to use something like pygenus or a web service to keep a JVM alive. Starting a JVM every time is a losing proposition.

jdchoi77 commented 8 years ago

If you want to just run the pos tagger, you can just turn off to load the other models. Please try with this configuration and let me know if it serves your purpose. Thanks.

https://github.com/emorynlp/nlp4j/blob/master/src/main/resources/edu/emory/mathcs/nlp/configuration/config-decode-pos.xml