emorynlp / nlp4j

NLP framework for JVM languages.
http://emorynlp.github.io/nlp4j/
Other
149 stars 33 forks source link

EnglishC2DConverter generates ArrayIndexOutOfBoundsException #24

Open reckart opened 7 years ago

reckart commented 7 years ago

I am unable to get the EnglishC2DConverter working. The following lines reproduce the problem.

        // This is an example from "src/test/resources/constituent/functionTags.parse"
        String pennTree = "(TOP (S (NP-SBJ (NP (CC both) (NNP Bush) (CC and) (NNP Rice))) (VP "
                + "(VBP have) (VP (VBN delivered) (NP (NP (NNS speeches)) (, ,) (SBAR (WHNP-1 "
                + "(WDT which)) (S (NP-SBJ (-NONE- *T*-1)) (VP (VBP are) (ADJP-PRD (RB very) "
                + "(JJ clear))))))))))";

        CTReader reader = new CTReader();
        reader.open(new StringInputStream(pennTree), "UTF-8");
        List<CTTree> trees = reader.getTreeList();
        System.out.println(trees);

        C2DConverter c2d = new EnglishC2DConverter(new HeadRuleMap(getClass()
                .getResourceAsStream("/edu/emory/mathcs/nlp/conversion/headrule_en_stanford.txt")));

        c2d.toDependencyGraph(trees.get(0));

This code prints an ArrayIndexOutOfBoundsException:

java.lang.ArrayIndexOutOfBoundsException: 12
    at edu.emory.mathcs.nlp.conversion.C2DConverter.initDEPTree(C2DConverter.java:195)
    at edu.emory.mathcs.nlp.conversion.EnglishC2DConverter.getDEPTree(EnglishC2DConverter.java:1129)
    at edu.emory.mathcs.nlp.conversion.EnglishC2DConverter.toDependencyGraph(EnglishC2DConverter.java:136)

Is the converter broken?

As a side note, the converter "swallows" the generated exception in EnglishC2DConverter.toDependencyGraph(CTTree) - i.e. it prints a stack trace and doesn't forward the exception to the calling code. IMHO it would be better if it did not print the stack trace and instead let the exception bubble up to the calling code.

jdchoi77 commented 7 years ago

Richard,

It's because we were in the middle of fixing the code. I temporarily made a separate project to provide the conversion script so please take a look:

http://github.com/emorynlp/ddr

Please let me know if you have more questions. Thanks.

reckart commented 7 years ago

I stumbled across this C2DConverter when I was looking where the semantic and secondary heads in the NLPNode come from - and then I was looking into wrapping that conversion as a new NLP4J UIMA component in DKPro Core. However, if it is known to be broken in the last NLP4J release, then I'd better wait for the next NLP4J release on Maven Central which then probably (?) will also include the new converter?