BLLIP / bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
http://bllip.cs.brown.edu/
227 stars 53 forks source link

Maximum sentence length is not really the maximum sentence length. #32

Open dmcc opened 9 years ago

dmcc commented 9 years ago

It seems that there are (at least) two off-by-one errors with these calculations:

shell% ./parseIt -l399 ../DATA/EN 398.sgml
<doesn't crash, gives dummy parse>
shell% ./parseIt -l399 ../DATA/EN 399.sgml
parseIt: GotIter.C:73: void LeftRightGotIter::makelrgi(Edge*): Assertion `i < 400' failed.
<segfaults>
shell% ./parseIt -l399 ../DATA/EN 400.sgml
<doesn't crash, sentence is "skipped" and dummy parse is printed instead>

The obvious workaround is to only parse things that are two fewer than the maximum sentence length (unlikely to be much of an issue in practice).