Closed keien closed 10 years ago
Email Aditi about this error.
Email sent.
Ah, probably, one of the sentences is too long. It runs out of memory. If I remember correctly, parsing is something like an N^3 memory operation, where N is the length of the sentence.
I also notice that you're parsing and sentence-tokenizing in the same pass.
What you could do is two separate passes over the input text. One to split into sentences, and the next to parse. Then, you can fail more gracefully on the sentences that are too long.
In the old pipeline, that's how I used to do it. The stanford parser in Java used to have a flag you could set that would make it fail gracefully for sentences like this. I used to set the maximum length to something like 40 words (which is a pretty generous as the average in English is something closer to 15 words). In this case, I would try the following (once you have split the text into sentences).
There is no great solution, though, because this misses long-range dependencies.
What you could do is two separate passes over the input text. One to split into sentences, and the next to parse. Then, you can fail more gracefully on the sentences that are too long.
I'm not sure that the python package we're using to interface with the java library would let us do that.
Suggestion one and two seem suitable, though... we'll check with Professor Hearst to see what she thinks.
@silverasm how would we do two passes over the text where we split the sentence in one pass and parse in the other?
Marking this as closed since we've found a solution.
Error: