cdiggins / myna-parser

Myna Parsing Library
https://cdiggins.github.io/myna-parser
MIT License
78 stars 15 forks source link

Question on requirements for the programing language you are working on. #4

Closed bd82 closed 7 years ago

bd82 commented 7 years ago

From the readme:

The JavaScript library that comes closest to what I was looking for in terms of API is probably Parsimmon. It did not satisfy my particular needs for the project I am working on

As another Parsing Library author (Chevrotain) I'm interested in what requirements Parsimmon did not satisfy?

cdiggins commented 7 years ago

Hi Shahar,

Good job with the Chevrotain library.

Here is what I felt was missing from the Parsimmon library:

bd82 commented 7 years ago

Thanks for your feedback @cdiggins

Rule definitions were more complex and not as easy to read for a JavaScript programmer than I think they could/should be: compare https://github.com/cdiggins/myna-parser/blob/master/grammars with https://github.com/jneen/parsimmon/tree/master/examples

Yes comparing the JSON myna and JSON parsimmon examples there is a big advantage in conciseness and clearness to myna.

Slower than I would have liked (https://sap.github.io/chevrotain/performance/)

Relatively or absolutely? Parsimmon is the 2nd faster on latest Chrome (V8). which browser did you run the benchmark on? If you are using Safari, it has severe regExp performance bugs that affect that benchmark.

Lacking some debugging tool: e.g. pretty printing of grammar, printing of the AST structure.

That is one of my main grips with parser combinators, I can't just place a breakpoint and debug The code using standard means (step over/step into/...).

I did not like the reliance on regular expressions for primitive parsers

I see your point.

// easier to understand - Myna
    this.fraction       = m.seq(".", m.digit.zeroOrMore);    
    this.plusOrMinus    = m.char("+-");
    this.exponent       = m.seq(m.char("eE"), this.plusOrMinus.opt, m.digits); 
    this.number         = m.seq(this.plusOrMinus.opt, m.integer, this.fraction.opt, this.exponent.opt).ast;

// harder to read - Parsimmon (RegExps)
var numberLiteral =
  token(P.regexp(/-?(0|[1-9][0-9]*)([.][0-9]+)?([eE][+-]?[0-9]+)?/))
    .map(Number)
    .desc('number');    
//
bd82 commented 7 years ago

It would be interesting to see how fast you can get Myna. It looks like you may have two self imposed constraints in terms of performance: by avoiding RegExps and also avoiding code generation.

cdiggins commented 7 years ago

Thanks for your feedback @bd82! I am going to be working hard on performance once I get a couple more grammars and tools built using Myna completed. I have a hypothesis that I can get a big boost to performance by applying rule rewriting during the RegisterGrammar step which is akin to the performSelfAnalysis used by Chevrotain parsers. Basically I will flatten nested rules where possible and use the lookahead rules which are hard to write but have better performance.

bd82 commented 7 years ago

applying rule rewriting during the RegisterGrammar step which is akin to the performSelfAnalysis used by Chevrotain parsers. Basically I will flatten nested rules where possible and use the lookahead rules which are hard to write but have better performance.

Well you could possibly do more optimizations than Chevrotain because Myna seems to be a more interpreted solution, Chevrotain augments an hand built parser to be easier to write. So no rule rewriting can be done, only trying to find the fastest way to run the user's code.

You can try profiling a benchmark scenario using IRHydra2 http://mrale.ph/irhydra/2/

It can show you functions that have not been optimized by V8 and the reasons why. This has allowed me to find a little trick to double Chevrotain's performance... (Avoid creating new Parser instances).