dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Left recursion #168

Closed veelo closed 8 years ago

veelo commented 8 years ago

Hi Philippe,

This properly supports hidden left-recursion, and also adds support for left-recursion in memoizing parsers. As far as code is concerned I think we are pretty much there -- I don't expect much changes after this. What could still be added is control over left/right associativity, but I plan to play with this first to see if I need it.

What I couldn't get to work is support for left-recursion at compile-time due to the error "cannot read static variable seed at compile time". I don't really understand why, but I noticed that you are not using memo at compile time either. So I figure I am in good company. Nevertheless, when left-recursion is detected to occur at compile time, compilation stops with an assertion error and explanation.

Parse speed for non-left-recursive rules should now be back at the level before I started contributing to Pegged, so the overhead at parse-time is minimal. The cost is payed at compile-time, with an introspection of the grammar. A template argument may be desirable to cut this overhead out for grammars that the developer knows for sure are not left-recursive, by which compilation of left-recursion support would be optional. Anyone could do this though, and I don't have direct plans to add it. Also, I don't use all the grammar properties that the introspection module gathers, so some speedup may be gained by configuring the unused parts out. No plans there either.

I plan to apply Pegged in an experimental project of mine in the new year. When I'm confident that left-recursion support works well in production, I can adjust the documentation of Pegged, comment on the odd issue regarding left-recursion and maybe add a wiki page on the implementation with references. When that's done maybe you feel like bumping the version of Pegged, and have a little release party on the announce-forum :-)

Season's greetings, Bastiaan.

PhilippeSigaud commented 8 years ago

On Wed, Dec 23, 2015 at 11:20 PM, veelo notifications@github.com wrote:

Hi Philippe,

Hi Bastiaan,

sorry for being so long without answering, I was away in holidays, without Internet connection.

This properly supports hidden left-recursion, and also adds support for left-recursion in memoizing parsers. As far as code is concerned I think we are pretty much there -- I don't expect much changes after this.

Wonderful!

What I couldn't get to work is support for left-recursion at compile-time due to the error "cannot read static variable seed at compile time". I don't really understand why, but I noticed that you are not using memo at compile time either. So I figure I am in good company.

Yes, some parts of D do not work at compile-time (even though the situation is far better than 5 yers ago!). Anyway, for 'real' grammars, I think parsing at runtime and generating an independent module is the way to go. Otherwise, to compilation time is far too long, for no added value.

Nevertheless, when left-recursion is detected to occur at compile time, compilation stops with an assertion error and explanation.

That's nice, good idea.

Parse speed for non-left-recursive rules should now be back at the level before I started contributing to Pegged, so the overhead at parse-time is minimal. The cost is payed at compile-time, with an introspection of the grammar.

That's a very good situation. Kudos for making this work!

A template argument may be desirable to cut this overhead out for grammars that the developer knows for sure are not left-recursive, by which compilation of left-recursion support would be optional. Anyone could do this though, and I don't have direct plans to add it. Also, I don't use all the grammar properties that the introspection module gathers, so some speedup may be gained by configuring the unused parts out. No plans there either.

I don't think it's necessary to add this right now.

I plan to apply Pegged in an experimental project of mine in the new year. When I'm confident that left-recursion support works well in production, I can adjust the documentation of Pegged, comment on the odd issue regarding left-recursion and maybe add a wiki page on the implementation with references. When that's done maybe you feel like bumping the version of Pegged, and have a little release party on the announce-forum :-)

I sure will! Adding left-recursion is quite enough to bump the version to 0.(x+1). (btw, I began reading the articles you sent me yesterday and will continue on my way back home.)

Yes documentation is important (and might very well be the reason many people used Pegged? I found having lots of wiki pages helped a lot to gently guide people.)

Thanks a lot for adding that much value to Pegged!

As for your project, I hope it will work well. I found Pegged to be a bit slow when compared to a usual parser, so don't hesitate to tell me how that goes.

Philippe

veelo commented 8 years ago

On 26 Dec 2015, at 17:40, Philippe Sigaud notifications@github.com wrote:

sorry for being so long without answering, I was away in holidays, without Internet connection.

No worries. Those holidays can be the best.

I plan to apply Pegged in an experimental project of mine in the new year. When I'm confident that left-recursion support works well in production, I can adjust the documentation of Pegged, comment on the odd issue regarding left-recursion and maybe add a wiki page on the implementation with references. When that's done maybe you feel like bumping the version of Pegged, and have a little release party on the announce-forum :-)

I sure will! Adding left-recursion is quite enough to bump the version to 0.(x+1). (btw, I began reading the articles you sent me yesterday and will continue on my way back home.)

Nice!

Yes documentation is important (and might very well be the reason many people used Pegged? I found having lots of wiki pages helped a lot to gently guide people.)

Quality documentation sure was the reason for me to start using Pegged and succeed in the first trials of my project.

Thanks a lot for adding that much value to Pegged!

Thank you for your appreciation! It means a lot :-)

As for your project, I hope it will work well. I found Pegged to be a bit slow when compared to a usual parser, so don't hesitate to tell me how that goes.

Speed is not my main concern. It may be beyond my capabilities, but I am going to try anyway: My evil plan is to write a translating compiler to translate a very large code base in an ageing language (Extended Pascal, ISO 10206) to D. Translation should be human-friendy, keeping indentation and inline comments. The language specification in the ISO document uses EBNF notation, and Pegged’s support for that notation is a huge asset: It will eliminate interpretation- and conversion-errors of the grammar on my part. The only issue that I fear could possibly pose problems is the ordered choice concept in PEGs, but I guess I won’t know unless I try. And of course I need to find or implement equivalent D constructs for every EP construct.

If I succeed and translation is slow, it will still be awesome. But if I succeed and it is not slow, it means blocks of Extended Pascal and D will be able to exist in the same file, which I think would be utter cool.

Extended Pascal is not a widely used language, but if I succeed then I expect it will be relatively easy to add other Pascal dialects, including Delphi / FreePascal, which may help more projects to switch to D.

Best, Bastiaan.

PhilippeSigaud commented 8 years ago

Speed is not my main concern. It may be beyond my capabilities, but I am going to try anyway: My evil plan is to write a translating compiler to translate a very large code base in an ageing language (Extended Pascal, ISO 10206) to D. Translation should be human-friendy, keeping indentation and inline comments. The language specification in the ISO document uses EBNF notation, and Pegged’s support for that notation is a huge asset: It will eliminate interpretation- and conversion-errors of the grammar on my part. The only issue that I fear could possibly pose problems is the ordered choice concept in PEGs, but I guess I won’t know unless I try. And of course I need to find or implement equivalent D constructs for every EP construct.

That's a nice project. Is EP grammar big (as in: lots of rules?) compared to C, to D? IIRC, I could get the C grammer to work, but failed with D.

Are all Pascal constructs found in D? I haven't used Pascal in quite a long time. I don't remember it having any 'exotic' ideas so indeed you should be able to translate blocks if it into D.

If I succeed and translation is slow, it will still be awesome. But if I succeed and it is not slow, it means blocks of Extended Pascal and D will be able to exist in the same file, which I think would be utter cool.

I think this is called Island parsing or island grammars (the idea being that you have a main language, the "sea" and "islands" of other languages embedded in it). How would you distinguish between EP blocks and D ones?

veelo commented 8 years ago

On 27 Dec 2015, at 22:12, Philippe Sigaud notifications@github.com wrote:

That's a nice project. Is EP grammar big (as in: lots of rules?) compared to C, to D? IIRC, I could get the C grammer to work, but failed with D.

I estimate the number of rules in EP to be about 300. Do you know what the problem is with large grammars? Are problems compile-time or parse-time? Do you run out of stack space? Is there a limit to the size of the parse (some of our files are huge)?

Are all Pascal constructs found in D? I haven't used Pascal in quite a long time. I don't remember it having any 'exotic' ideas so indeed you should be able to translate blocks if it into D.

Extended Pascal is more advanced than Pascal, but most constructs are available in D. Nested functions are a bit exotic, but present in D. There is a construct called Schema, a kind of parameterised types, which I recon can be implemented using D templates.

If I succeed and translation is slow, it will still be awesome. But if I succeed and it is not slow, it means blocks of Extended Pascal and D will be able to exist in the same file, which I think would be utter cool.

I think this is called Island parsing or island grammars (the idea being that you have a main language, the "sea" and "islands" of other languages embedded in it). How would you distinguish between EP blocks and D ones?

Nice to know there is a term for it. I expect EP blocks would be strings, much like a mixin. I don’t think it would be very useful though (no syntax highlighting and longer compile times) compared to translating once and continuing development in D. But if it would work it would mean that slow adopters could continue to program in EP if they insist. The EP code base is still in active development, mostly by engineers having many years experience but no formal CS education, for whom a language change would be quite disruptive. But even in this case it would probably be more productive to have complete EP files that are translated as part of the build process.

Bastiaan.

PhilippeSigaud commented 8 years ago

On Wed, Dec 23, 2015 at 11:20 PM, veelo notifications@github.com wrote:

When that's done maybe you feel like bumping the version of Pegged, and have a little release party on the announce-forum :-)

I pushed a v0.3 tag attached to the Pegged HEAD, explaining that it brings two new features: left-recursion and case-insensitive literals.

Can you tell me if you

PhilippeSigaud commented 8 years ago

On Sun, Jan 31, 2016 at 3:49 PM, Philippe Sigaud philippe.sigaud@gmail.com wrote:

On Wed, Dec 23, 2015 at 11:20 PM, veelo notifications@github.com wrote:

When that's done maybe you feel like bumping the version of Pegged, and have a little release party on the announce-forum :-)

I pushed a v0.3 tag attached to the Pegged HEAD, explaining that it brings two new features: left-recursion and case-insensitive literals.

Can you tell me if you

Drat, one of my children bumped into me...

So. Can you tell me if you see it also and if you have any trouble with the wiki? I'd like to have up to date documentation before posting on dlang's announce forum.

Philippe

veelo commented 8 years ago

I saw the tag, yes. No problems with the wiki. I think I adjusted all the places in the wiki that mentioned left-recursion, but I haven't dedicated a page to the subject, yet. That will have to wait until I am convinced everything has crystallised.

There is one problem at the moment for memoizing parsers of grammars that have nested left-recursive cycles: https://github.com/PhilippeSigaud/Pegged/blob/master/pegged/grammar.d#L476 unblocks memoization regardless how many times blocking has occurred (https://github.com/PhilippeSigaud/Pegged/blob/master/pegged/grammar.d#L466). Fixed in https://github.com/veelo/Pegged/commit/5d0862a44bc9188aa0b69cd35620049a541a110d.

Another deficiency is left-recursive rules that match the empty string. Fixed in https://github.com/veelo/Pegged/commit/3f0b7255bf1ae90c6aac3fea9ba17884c6ce76f5.

You might want to cherry pick these.

There are a few other things in my branch, but the reason I haven't made a PR yet is that the blocking of memoization is very conservative. A more efficient approach needs a much deeper analysis of the grammar though, which isn't ready yet.

Orthotoganally to this, I have a tracer in the works that uses std.experimental.logger to see what decisions are made during parsing. Only thing is it generates GBs of output on my grammar, that's how I discovered that the blocking needs to be more precise. It's not on github yet though, and it needn't be part of 0.3.0.

To make a long story short: left-recursion support is currently fine except for a few corner cases, and larger grammars may need the optimisations that I am planning. These could also go into 0.3.1.

Nice to have you back :-)