Closed reckart closed 9 years ago
Can you provide the sentence on which this fails?
Original issue reported on code.google.com by richard.eckart
on 2013-10-08 15:50:09
Der These des Philologenverbands, dass die Abschaffung des Sitzenbleibens vor allem
auf Kosten der Qualität der Leistungsergebnisse bei mittlerer Reife und Abitur gehen
wird, stimmten 73 Prozent der Befragten zu. jon Mehr auf SPIEGEL ONLINE: Streitthema
Ehrenrunde \"Sitzenbleiben ist peinlich\" (20.02.2013) http://www.spiegel.de/schulspiegel/wissen/0,1518,884286,00.html
Fotostrecke Prominente Sitzenbleiber http://www.spiegel.de/fotostrecke/fotostrecke-93367.html
Sitzenbleiber als Spitzenpolitiker \"Edmund, du bist faul!
Original issue reported on code.google.com by vovk.artem
on 2013-10-08 17:47:43
That's pretty long for one sentence and it's in fact not even a sentence at all. What
segmenter do you use?
Original issue reported on code.google.com by richard.eckart
on 2013-10-08 20:26:20
This is the text extracted from html, therefore it looks like this. I use just simple
BreakIteratorSegmenter.
P.S. I split this sentence into two separate ones (on this position: "...zu. jon...")
and it works now without exception.
Original issue reported on code.google.com by vovk.artem
on 2013-10-08 20:36:37
I can reproduce the exception with the BreakIteratorSegmenter. It looks like the Berkeley
parser is not parsing this at all. It returns a tree consisting only of the ROOT node
with no children. No idea why this happens.
For the time being let's keep this open as a known issue. A workaround may be to use
a smarter segmenter, e.g. the LanguageToolSegmenter or the StanfordSegmenter.
Maybe there is a way to extract a parse from the parser, otherwise the wrapper needs
to be change to simply skip such sentences.
Original issue reported on code.google.com by richard.eckart
on 2013-10-08 21:09:51
The BerkeleyParser cannot parser some sentences. In such cases, it returns an empty
tree consisting only of a root node. The DKPro Core component cannot handle this.
The parser logs a message like this before generating the empty result:
Warning: no symbol can generate the span from 0 to 88.
The score is -Infinity and the state is supposed to be ROOT
The insideScores are [4.9E-324] and the outsideScores are [1.0]
The maxcScore is -Infinity
Original issue reported on code.google.com by richard.eckart
on 2014-02-28 09:41:53
Issue 350 has been merged into this issue.
Original issue reported on code.google.com by richard.eckart
on 2014-02-28 09:42:07
(No text was entered with this change)
Original issue reported on code.google.com by richard.eckart
on 2014-02-28 09:53:25
(No text was entered with this change)
Original issue reported on code.google.com by richard.eckart
on 2014-03-26 10:51:56
Original issue reported on code.google.com by
vovk.artem
on 2013-10-08 14:07:11