dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

StanfordNLP MaxentTagger (e.. StanfordPosTagger) cannot load models #890

Closed pasky closed 8 years ago

pasky commented 8 years ago

Hi! When trying to use StanfordPosTagger on an es-language CAS, we get a crash in dkpro, both with 1.8.0 and master when trying to load the model resource.

The resource seems to be provisioned fine:

INFO ResourceObjectProviderBase - Producing resource from [jar:file:/root/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-upstream-tagger-es-distsim/jars/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-upstream-tagger-es-distsim-20150108.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/tagger-es-distsim.ser.gz] redirected from [jar:file:/root/.ivy2/cache/de.tudarmstadt.ukp.dkpro.core/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-model-tagger-es-distsim/jars/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-model-tagger-es-distsim-20150108.1.jar!/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/tagger-es-distsim.properties]                                                                                                  

but StanfordNLP cannot open it:

Caused by: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)                                  
     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:770)                                                          
     at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:298)                                                                    
     at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordPosTagger$1.produceResource(StanfordPosTagger.java:175)                                    
     at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordPosTagger$1.produceResource(StanfordPosTagger.java:162)                                    
     at de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase.loadResource(ResourceObjectProviderBase.java:710)                     
     at de.tudarmstadt.ukp.dkpro.core.api.resources.ResourceObjectProviderBase.configure(ResourceObjectProviderBase.java:576)                        
     at de.tudarmstadt.ukp.dkpro.core.api.resources.CasConfigurableProviderBase.configure(CasConfigurableProviderBase.java:36)                       
     at de.tudarmstadt.ukp.dkpro.core.resources.ModelProviderBase.configure(ModelProviderBase.java:78)                                               
     at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordPosTagger.process(StanfordPosTagger.java:203)                                              
     at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)                                            
     at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)        
     ... 8 more                                                                                                                                      
Caused by: java.io.IOException: Unable to open "de/tudarmstadt/ukp/dkpro/core/stanfordnlp/lib/tagger-es-distsim.ser.gz" as class path, filename or URL                                                                                                                                                    
     at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(IOUtils.java:485)                                                   
     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:765)                                                          
     ... 18 more                                                                                                                                     

It seems this is caused by the hack catering to older StanfordParser versions in StanfordPosTagger.java line 170, which seems to break things as it stands now. Commenting out that if(){} fixes the issue, but I'd rather someone who understands this to judge if that's okay to remove again (and maybe it needs to be removed in other Stanford wrappers too?).

reckart commented 8 years ago

We can remove the workaround now. However, it puzzles me why it would work for you without the workaround. We have unit tests that load the Spanish pos tagger model from a JAR and they work nicely. Looks like you are using auto-loading. Does it work when you add a direct dependency to the model to your POM?

pasky commented 8 years ago

(Sorry, testing that question isn't a simple matter for me right now as I'm only consulting on that project.)