dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Example Extended Pascal parser. #217

Closed veelo closed 7 years ago

veelo commented 7 years ago

Hi Philippe,

I finally got around to submit the Extended Pascal grammar as an example. After compilation of the parser it also demonstrates the html output.

Hope you like it.

Best, Bastiaan.

PhilippeSigaud commented 7 years ago

Impressive work Bastiaan! It brings together many improvements you made in Pegged through the years: left-recursion, case-insensitive parsing, kudos!

It was two years ago (May 2015) you first mailed me about getting Pegged to deal with Extended Pascal, you persevered all these years, that's great!

I spent some time reading the EP grammar. It's quite big (as expected), how long does it take to generate a parser and how fast is it to parse reasonable programs? (btw, the lone line 'auto parseTree = EP(readText(args[ 1]));' warms my heart: it's exacty as simple as I dreamt Pegged to be).

Why does the grammar has this '_' space symbol, different from the Spacing rule?

I'll merge the pull request.

Philippe

veelo commented 7 years ago

Thank you!

I added the _ symbol (which includes comments also) because my intent is to get a translating compiler eventually that is to produce maintainable code, so I need to translate the comments as well. Maybe layout will be somewhat preserved, although a pass through dformat may be necessary (if the target language gets to be D).

It has been a long time indeed and I thought I had come a long way, but I just threw a 5kloc Pascal file at it, and it failed quite early. It seems the ordered "or!()" evaluation, that continues with he first succeeding rule, causes branches to remain unexplored which causes a rule higher up in the tree to fail. I think I need a version of "or!()" that evaluates all alternatives and returns the longest match. I am not sure whether that would break with PEG principles.

My earlier experience is that the parsing is not fast, but it seems a lot faster now. Not sure if that is because of compiler improvements or because I am not testing as thorough. I used to think that string concatenations and other allocations are a bottleneck. It could be that there are optimisation opportunities, for example using std.experimental.allocator for parse tree nodes. Stefan Koch, the developer of the new CTFE engine, thinks it is the templates. He is probably right, especially at CT. One obvious optimisation is removing the TParseTree-based templates, which I am not sure pull their weight.

Turning on the tracer makes it dead slow, but that's OK. (I have a compile fix for recent dmd in the making.)

So how fast it is I can't really say before it succeeds at parsing bigger files :-( The generated parser module is 13kloc.

Bastiaan.

veelo commented 7 years ago

Note: personally I am not concerned with performance at all, translation would be much a one-time event. CTFE parsers are cool, but not practical (no memoization) and unnecessary in my application.