digitalheir / java-probabilistic-earley-parser

🎲 Efficient Java implementation of the probabilistic Earley algorithm to parse Stochastic Context Free Grammars (SCFGs)
MIT License
36 stars 12 forks source link

Implement ε-rules (empty rules) #6

Open digitalheir opened 7 years ago

digitalheir commented 7 years ago

The parser currently can't handle rules of the form

X → ε            (p)

where ε is the empty string.

See section 4.7 Null Productions on page 19 of Stolcke's paper.

We have the choice of extending prediction and completion to work with ε-rules, but this is a bit complicated. Another possibility is to rewrite the grammar to eliminate these productions, described at the end of page 20, 4.7.4 Eliminating null productions.

Best to implement the simpler solution first, and implement the philosphically correct version later.

nlpguyz commented 6 years ago

Here is an implementation of Earley that handles Epsilon transition: https://github.com/tomerfiliba/tau/blob/master/Earley.java

Is it possible to port the feature here?

digitalheir commented 6 years ago

It's possible, but not as easily as in the example you give. Dealing with probabilities complicates the issue considerably. Stolcke describes how to it in his paper.

I welcome any work on it, but am not planning to implement it myself.