Closed halfak closed 7 years ago
Thanks for the report!
My suspicion is that the bridge between the parser and reranker is not handling the space token correctly:
>>> rp.parse(["I", "am", "a", "little", "teapot", ".", " ", "What", "?"], rerank=False)[0]
ScoredParse('(S1 (S (S (NP (PRP I)) (VP (VBP am) (NP (DT a) (JJ little) (NN teapot))) (. .)) (VP (VBZ ) (NP (WP What))) (. ?)))', parser_score=-128.58688297955965, reranker_score=None)
>>> rp.parse(["I", "am", "a", "little", "teapot", ".", "What", "?"])[0]
ScoredParse('(S1 (S (NP (PRP I)) (VP (VBP am) (FRAG (NP (DT a) (JJ little) (NN teapot)) (. .) (WHNP (WP What)))) (. ?)))', parser_score=-105.83222024880615, reranker_score=-30.588081364935757)
>>> rp.parse(["a", " ", "b"])
zsh: segmentation fault (core dumped) python
I'll add an input validator to avoid future crashes, but as a workaround, I recommend removing any tokens that are purely whitespace when you're using the pre-tokenized mode (it's not clear what the part of speech for whitespace is, or the overall parse for that matter).
Seems like an easy enough workaround. Thanks.
(Finally) added an input validator: f01ade870c39054a116531bf07c35d78ae46cedd
I get a segmentation fault when parsing
["I", "am", "a", "little", "teapot", ".", " ", "What", "?"]
usingWSJ-PTB3
. See the repl paste below.I'm running Ubuntu 16.04 64bit.
I found this in /var/log/syslog: