dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Passing and returning state from actions #256

Open Aphexus opened 5 years ago

Aphexus commented 5 years ago

It would be convenient to be able to pass around state information in a grammar.

It might be as simple as passing around a "global" state(an object that contains whatever information the actions will use) to as complex as as being able to "save", modify and pass the state around in the grammar.

mixin(grammar( Numbers: Scientific <~ Floating ( ('e' / 'E' ) Integer )? Floating <~ Integer ('.' Unsigned )? {DoFloat} Unsigned <~ [0-9]+ Integer <~ Sign? Unsigned Hexa <~ [0-9a-fA-F]+ Sign <- '-' / '+' , state));

Then when DoFloat is called, state is passed. The usefulness of this should be obvious. Makes it easier to deal with threading, etc. A truly global state can be used but then it is not safe because all globals are not safe.

mixin(grammar( Numbers: Scientific <~ Floating ( ('e' / 'E' ) Integer )? Floating <~ Integer ('.' Unsigned )? {y = DoFloat(state)} Unsigned <~ [0-9]+ Integer <~ Sign? Unsigned { state = DoInteger(y); } Hexa <~ [0-9a-fA-F]+ Sign <- '-' / '+' ,state));

(not a good example, just showing syntax)

If Integer occurs first then y would be null.

It would probably be relatively simple to implement since it translates directly in to D code.

Could allow for multiple states too.

As of now I'm having to use a global state. Since lambada can't be used, which would capture the context and provide a way out, I don't see any other way to do it with Pegged.

(Although since the grammar is mixed in I suppose the inline lambdas would capture context of the scope of mixin?)

Aphexus commented 5 years ago

Also, the ability to call actions that are not global would be nice.

MyActions.DoFloat

or

state.DoFloat

veelo commented 5 years ago

It would be convenient to be able to pass around state information in a grammar.

I'd investigate whether the following could be of help. When you generate the parser, the last line reads like

alias GenericMyGrammar!(ParseTree).MyGrammar MyGrammar;

assuming the first rule in your grammar is MyGrammar. ParseTree is defined on https://github.com/PhilippeSigaud/Pegged/blob/master/pegged/peg.d#L237. From the comment on that line, it seems that you should be able to define MyParseTree with similar API but adding state information that you then can maintain and access in your actions, if you use

alias GenericMyGrammar!(MyParseTree).MyGrammar MyGrammarWithState;

I'm not sure if you'll be able to have state carry over through all rules though, give it a try. Neither do I know whether the support for custom ParseTrees is well maintained, maybe you'll have to produce some patches to get this to work again. For example, in the generated parser I see several occurrences of ParseTree where they probably should be TParseTree.

veelo commented 5 years ago

Makes it easier to deal with threading, etc.

Multi threading? How, and where? PEG alternation is greedy, so there is not much opportunity to parse an input in multiple threads. Unless, that is, you use my longest match alternator extension, which you should avoid if possible since it adds significantly to the time complexity. Pegged applies memoization intensively (which is of much higher benefit than any degree of multi threading) so multi threading would need a lock on the memoization table which could significantly reduce its benefit the benefit of multi treading.

There are other opportunities to speed up Pegged, such as replacing string comparisons with integer comparisons, much like dparser does.

Aphexus commented 5 years ago

What I mean is that the grammar, using semantic actions seem to have to refer to a global state which is not thread safe. Static functions can be used and referred to in the action but it does not help.

If I wanted to use the same grammar in a class, say the constructor to parse multiple inputs in different threads(to speed up the process of course), then any data modifications in semantic actions must be on the instance.

class ParseText { float f; ParseTree DoFloat(ParseTree p) { f = ...; return p; } this(string input) { mixin(grammar(..., "This")); // Tell the grammar to use this for semantic actions(simply prefixes This. to the calls, which is then replaced later with the appropriate object).

   // The idea is that any calls to semantic action will need to call on this so they can modify instance data.

   auto parseTree = Grammar(input, this);
   // this object is passed

} }

--

As far as extending ParseTree, man, it would be lovely if ParseTree was a class! ;) Then one could just inherit. Luckily ParseTree isn't too complex. I suppose one could auto generate the ParseTree interface and simply mix it in to avoid having to dup/copypaste the code(essentially just wrap/decorate it).

I've tried modifying the code but ParseTree is hard coded very heavily. I modified about 500 lines is parser .d along with other stuff but everything seems to be hard coded.

I'm thinking it might be better to alias ParseTree somehow but my attempts failed ;/

Aphexus commented 5 years ago

I simply added a void* to ParseTree and that allows one to assign data to it with minimal change.

Would be nice to be able to parameterize ParseTree but, again, it seems to be a huge mess due to the hard coding.

This doesn't really help too much with dispatching semantic actions though...

Aphexus commented 5 years ago

BTW, here are the types of errors one gets when simply using the alias and a new struct

Error: none of the overloads of defined are callable using argument types (MyParseTree), candidates are: ........\D\Libraries\pegged\peg.d(2828): pegged.peg.defined!(zeroOrMore, "PGNGrammar.TagName").defined(ParseTree p) ........\D\Libraries\pegged\peg.d(2836): pegged.peg.defined!(zeroOrMore, "PGNGrammar.TagName").defined(string input) ........\D\Libraries\pegged\peg.d(2841): pegged.peg.defined!(zeroOrMore, "PGNGrammar.TagName").defined(GetName g)

I did parameterize defined to take a TParseTree but then a bunch more errors about other parts of pegged had issues too... So I just reverted.

veelo commented 5 years ago

I did parameterize defined to take a TParseTree but then a bunch more errors about other parts of pegged had issues too... So I just reverted.

Just use search and replace, I think it would make a wonderful pull request (if it turns out to be useful!)

Using a mixin sounds like the right approach.

veelo commented 5 years ago

Note that parser.d is generated, IIRC.