dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Dreaming: avoid unnecessary node allocations and parse lazily #97

Open chadjoan opened 11 years ago

chadjoan commented 11 years ago

Continued from issue #68:

One thing that's bugging me with this is that "dropping" as a semantic action probably involves cutting nodes from an already-allocated tree. I'm kind of hoping that eventually pegged will be able to prevent undesired nodes from ever existing in the first place, possibly to the point where an entire parse tree can be lazily evaluated.

And how would you know if parsing is successful? How would you know what other (further down the input) nodes are without exploring them in the exact order the grammar define them? Laziness is possible, but not without accepting a parse attempt might suddenly fail (and thus, you should not be used by other functions).

I expect that a recursive descent parser (or anything of equivalent power) will need some auxiliary stack storage proportional to the depth of the parse tree. I'm not convinced that they would need memory equivalent to the length of the input though: event-based parsers exist and do pretty well.

If things were written such that the parse tree wasn't required for parsing, then it would be pretty easy to avoid allocating unnecessary nodes.

I'm not too worried about parses suddenly failing. The code receiving parse events probably isn't going to be directing air traffic in real-time as it chugs along. Whatever work was going on during parse can probably be aborted just as easily as the parsing itself.

Taken to the extreme, the ideal is to allow something like a SAX (xml) parser to be written in Pegged and have no-memory-required parsing (maybe some memory for memoization, if used). Of course, I'm not sure if this is one of your goals, but it seems like something that would be important for faster parsing and a greater acceptance of Pegged in the D community as a blessed tool for doing code manipulation.

Code manipulation is done at compile-time. I don't think we are at a point where compile-time action can be a no-memory action.

I think that by faster parsing I was referring to the strategy of avoiding unnecessary node allocations.

I don't think the no-memory-required parsing would be important for D code manipulation at compile time. Parse trees (ASTs, really) would be much better for that.

I'm actually more interested in using Pegged to write a compiler than I am in using it at compile time. I want to use it in a D compiler! I think that D's compile-time evaluation abilities are just too weak right now to use Pegged in mixins, even on moderately simple grammars (doable but with bad build times), so I tend to just generate a parser's D code into a separate file and compile it in another build phase. I am glad that it has proven workable though: I think a D compiler with dedicated interpreting logic would be able to do well with Pegged's output and I look forward to that day. In the meantime, I optimize for metal, so I'm eying those extra nodes ;)

Right now, I wouldn't be able to justify doing work on this kind of enhancement. I may actually try to work on this if I find parsing to be a bottleneck in the future. Until then, I present it as a design consideration and possibly a task for anyone that might want to approach it for entertainment or their own needs.

PhilippeSigaud commented 11 years ago

If things were written such that the parse tree wasn't required for parsing, then it would be pretty easy to avoid allocating unnecessary nodes.

That's what many compilers do: they never generate the parse tree: instructions are output directly. The parse tree remains 'implicit'.

Right now, I wouldn't be able to justify doing work on this kind of enhancement. I may actually try to work on this if I find parsing to be a bottleneck in the future. Until then, I present it as a design consideration and possibly a task for anyone that might want to approach it for entertainment or their own needs.

OK, fair enough.