digitalheir / java-probabilistic-earley-parser

🎲 Efficient Java implementation of the probabilistic Earley algorithm to parse Stochastic Context Free Grammars (SCFGs)
MIT License
36 stars 12 forks source link

Writing cf-gammars without probabilities #13

Open roddar92 opened 4 years ago

roddar92 commented 4 years ago

Dear colleagues,

I'm exploring a good Earley parsers for writing cf-grammars, and this one seems to be friendly for me. Could you tell please, does this parser allow to write cf-grammars without probabilities setting?

P.S. I need Java parser like Lark (Python) for directly rule writing.

Thanks, Daria

digitalheir commented 4 years ago

Dashka,

It is possible if you set all rule probabilities equal to each other, or just ignore the computed probabilities. The parse trees are the same.

If you just need to parse a little and performance is not critical, this library should work just fine. However, this software does not implement all optimizations that are possible for non-probabilistic CFGs, because the probabilities pose some limitations on that. So I cannot make certain efficiency guarantees that a project like Marpa does. As far as I've seen, Marpa is the best non-probabilistic CFG parser out there in terms of efficiency but not possible to use with Java (sadly).

Maarten

digitalheir commented 4 years ago

Just a heads up in case you run into a strange error about a singular matrix.

You (maybe) need to make sure that the sum of probabilities for any given goal does not exceed 1.0, or you might end up with an error. Strictly speaking, this is an error on the side of the grammar authour, but my software could be more friendly about it. I intended the software to spot this error and normalize the probabilities so they add up to 1, but I forgot if I ever released it.

This is the issue that described the situation: https://github.com/digitalheir/java-probabilistic-earley-parser/issues/12