coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.
Apache License 2.0
30 stars 10 forks source link

CoreNLP NER crash #76

Closed namednil closed 5 years ago

namednil commented 5 years ago

Running the AMR pipeline, ToAmConll crashed on the training data because of an issue in the handling of named entities using CoreNLP.

This is the error message and the output:

0
1000
[crickets]
[<, crickets, >]
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at de.saar.coli.amrtagging.formalisms.amr.tools.preproc.StanfordNamedEntityRecognizer.tag(StanfordNamedEntityRecognizer.java:40)
        at de.saar.coli.amrtagging.formalisms.amr.tools.ToAMConll.main(ToAMConll.java:198)

There's indeed a sentence that is < crickets >, which might be a problem for code that does replaceAll("[<>]", "").

@alexanderkoller, since you wrote the code: is that an easy fix for you?

alexanderkoller commented 5 years ago

I can try to fix it, but I don't think this is my code. @jgroschwitz can you say why the angle brackets need to be deleted here?

alexanderkoller commented 5 years ago

@namednil can you check if this fixed it?

namednil commented 5 years ago

No crash, output looks good.