endu50 / clearnlp

Automatically exported from code.google.com/p/clearnlp
Other
0 stars 0 forks source link

Latest models seem to be broken #6

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Checkout the latest Git master from googlecode
2. Download the latest models (eg 
https://bitbucket.org/jdchoi77/models/downloads/ontonotes-en-pos-1.3.0.tgz )
3. Parse using eg
 mvn exec:java -Dexec.mainClass=com.googlecode.clearnlp.demo.DemoDEPParser  -Dexec.args="model/dictionary-1.2.0.zip model/ontonotes-en-pos-1.3.0.tgz model/ontonotes-en-dep-1.3.0.tgz src/main/resources/sample/iphone5.txt src/main/resources/sample/iphone5.txt.newparsed"

What is the expected output? What do you see instead?
Instead of parse output, we get a null pointer exception

What version of the product are you using? On what operating system?
Git master 6fb797d1ad2a49946fcf907c77045136940936e3 (version 1.3.0)

Please provide any additional information below.
Parsing works fine with the old models. Looks like the models are misaligned 
with the Git version

Original issue reported on code.google.com by admac...@gmail.com on 25 Jan 2013 at 1:16

GoogleCodeExporter commented 8 years ago
I have a similar issue. I believe the problem is that the EngineGetter is 
expecting to find files with names like 'CONFIGURATION', 'FEATURE', etc, (see 
e.g., EngineGetter.java, line 154), but in the new models the zipped files have 
names like 'pos_CONFIGURATION' and 'posFEATURE0'.  Since the EngineGetter 
functions look for file names using strict equality, in functions like 
getPosTaggers() (EngineGetter.java, line 138), the function loops without 
finding anything, and most fields stay null.  To fix this, model files should 
either be renamed, or some other sort of file matching should take place in 
EngineGetter.java. 

Original comment by swisema...@gmail.com on 9 Feb 2013 at 4:22