dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Exception while executing pipeline with BerkeleyParser #261

Closed reckart closed 9 years ago

reckart commented 9 years ago
I'm getting an exception while executing my pipeline with BerkeleyParser:
  [java] Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
...
     [java]     at de.tudarmstadt.ukp.dkpro.core.berkeleyparser.BerkeleyParser.process(BerkeleyParser.java:267)
     [java]     at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)

It seems that List<Annotation> childAnnotations = new ArrayList<Annotation>(); is empty
but it is trying to access the first element on line 337.

What version of the product are you using? On what operating system?
I'm using dkpro-gpl 1.5.0 on OS X / Ubuntu

Original issue reported on code.google.com by vovk.artem on 2013-10-08 14:07:11

reckart commented 9 years ago
Can you provide the sentence on which this fails?

Original issue reported on code.google.com by richard.eckart on 2013-10-08 15:50:09

reckart commented 9 years ago
Der These des Philologenverbands, dass die Abschaffung des Sitzenbleibens vor allem
auf Kosten der Qualität der Leistungsergebnisse bei mittlerer Reife und Abitur gehen
wird, stimmten 73 Prozent der Befragten zu. jon Mehr auf SPIEGEL ONLINE: Streitthema
Ehrenrunde \"Sitzenbleiben ist peinlich\" (20.02.2013) http://www.spiegel.de/schulspiegel/wissen/0,1518,884286,00.html
Fotostrecke Prominente Sitzenbleiber http://www.spiegel.de/fotostrecke/fotostrecke-93367.html
Sitzenbleiber als Spitzenpolitiker \"Edmund, du bist faul!

Original issue reported on code.google.com by vovk.artem on 2013-10-08 17:47:43

reckart commented 9 years ago
That's pretty long for one sentence and it's in fact not even a sentence at all. What
segmenter do you use?

Original issue reported on code.google.com by richard.eckart on 2013-10-08 20:26:20

reckart commented 9 years ago
This is the text extracted from html, therefore it looks like this. I use just simple
BreakIteratorSegmenter. 

P.S. I split this sentence into two separate ones (on this position: "...zu. jon...")
and it works now without exception.

Original issue reported on code.google.com by vovk.artem on 2013-10-08 20:36:37

reckart commented 9 years ago
I can reproduce the exception with the BreakIteratorSegmenter. It looks like the Berkeley
parser is not parsing this at all. It returns a tree consisting only of the ROOT node
with no children. No idea why this happens. 

For the time being let's keep this open as a known issue. A workaround may be to use
a smarter segmenter, e.g. the LanguageToolSegmenter or the StanfordSegmenter.

Maybe there is a way to extract a parse from the parser, otherwise the wrapper needs
to be change to simply skip such sentences.

Original issue reported on code.google.com by richard.eckart on 2013-10-08 21:09:51

reckart commented 9 years ago
The BerkeleyParser cannot parser some sentences. In such cases, it returns an empty
tree consisting only of a root node. The DKPro Core component cannot handle this.

The parser logs a message like this before generating the empty result:

Warning: no symbol can generate the span from 0 to 88.
The score is -Infinity and the state is supposed to be ROOT
The insideScores are [4.9E-324] and the outsideScores are [1.0]
The maxcScore is -Infinity

Original issue reported on code.google.com by richard.eckart on 2014-02-28 09:41:53

reckart commented 9 years ago
Issue 350 has been merged into this issue.

Original issue reported on code.google.com by richard.eckart on 2014-02-28 09:42:07

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2014-02-28 09:53:25

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2014-03-26 10:51:56