BLLIP / bllip-parser

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.
http://bllip.cs.brown.edu/
227 stars 53 forks source link

Error when using tag() function with the adverb "where" #36

Closed mallihee closed 9 years ago

mallihee commented 9 years ago

In [3]: rrp.tag('where')

IndexError Traceback (most recent call last) /home/DataSet/ in () ----> 1 rrp.tag('where')

/usr/local/lib/python2.7/dist-packages/bllipparser/RerankingParser.py in tag(self, text_or_tokens) 538 text_or_tokens can be either a string or a sequence of tokens.""" 539 parses = self.parse(text_or_tokens) --> 540 return parses[0].ptb_parse.tokens_and_tags() 541 542 def _find_bad_tag_and_raise_error(self, tags):

IndexError: list index out of range


However, When I tag a phrase like ('where is'), I got the result:

In [5]: rrp.tag('where is') Out[5]: [('where', 'WRB'), ('is', 'VBZ')]

dmcc commented 9 years ago

Thanks for the report. I was able to reproduce this with some (but not all) parsing models. The bug comes down to the fact that some parsing models can't parse the sentence "where" (but for technical reasons, they can parse "where is"). The tag() function simply parses text as if it was a full sentence and rips the tags off the first tree (you'll get the best accuracy by giving it complete sentences). I've made it so that the tag() method will fall back to the most frequent POS tags in these cases and added a flag (allow_failures=True) to disable this behavior (= get an error when the parse fails in case that should be handled differently).