badicsalex / peginator

PEG parser generator for creating ASTs in Rust
MIT License
34 stars 3 forks source link

Support inline rule #1

Closed oovm closed 2 years ago

oovm commented 2 years ago

What is an inline rule

The inline rule is a phantom node.

If a node is too long or appears multiple times, the inline rule can be extracted.

For example:

Suppose the syntax ~rule is used.

assign_statement ::= 
    | LET "(" ~assign_name ")" type_hint? SET assign_rhs eos?
    | LET "(" ~assign_name ")" type_hint? eos?
    | LET ~assign_name SET assign_rhs eos?
    | LET ~assign_name eos

assign_name ::= pair:assign_pair (Comma pair:assign_pair)* Comma?
assign_pair ::= MODIFIERS* Symbol type_hint?
assign_rhs ::= if_statement|match_statement|expr
LET ::=  "let"
SET ::=  "="

Equivalent to

assign_statement ::= 
    | LET "(" pair:assign_pair (Comma pair:assign_pair)* Comma? ")" type_hint? SET assign_rhs eos?
    | LET "(" pair:assign_pair (Comma pair:assign_pair)* Comma? ")" type_hint? eos?
    | LET pair:assign_pair (Comma pair:assign_pair)* Comma? SET assign_rhs eos?
    | LET pair:assign_pair (Comma pair:assign_pair)* Comma? eos
// Ideally assign_name should be cleaned up by dead code elimination
badicsalex commented 2 years ago

Yeah, this looks like the same thing as Tatsu's "include operator": https://tatsu.readthedocs.io/en/stable/syntax.html#id29

The main goal would be to flatten the resulting structure, right?

I was thinking about implementing this feature, because I used it extensively back in the day, but by the current architecture converts rules independently from each-other, so a bunch of refactoring would be needed to do this. Or maybe a preprocessing step.

Out of curiosity, which parsing lib uses ~ for this?

badicsalex commented 2 years ago

By the way, for this specific case in peginator you could use an override rule to flatten the structure the same way by writing something like this:

assign_statement = 
     LET "(" pairs:assign_pair_list ")" [type_hint] SET assign_rhs [eos] |
     LET "(" pairs:assign_pair_list ")" [type_hint] [eos] |
     LET pairs:assign_pair_list SET assign_rhs [eos] |
     LET pairs:assign_pair_list eos ;

assign_pair_list = @:assign_pair {Comma @:assign_pair} [Comma] ;
assign_pair = {MODIFIERS} Symbol [type_hint] ;
assign_rhs = if_statement|match_statement|expr ;
LET =  "let" ;
SET =  "=" ;

But I understand this does not solve the case where there are multiple fields in the inlined rule.

oovm commented 2 years ago

Yes, the main purpose is to get a flat structure and reduce redundant dot calls

oovm commented 2 years ago

I use a preprocessing to generate peginator ast to achieve inline

Can the grammar module be set to pub?

https://github.com/badicsalex/peginator/blob/a8368bbdb8dcb7c8b2d3e779d8bc3bb4cac3fbc0/src/lib.rs#L153

The current implementation uses interpolated strings, which are neither sound nor type-safe.

badicsalex commented 2 years ago

Sure I can, but I'm not sure if you want to go that low level (and I guess generate code directly from the AST), it will be pretty unstable.

Note that I'm working on this feature. The syntax will be:

ComplexRule = >OtherRule { >OtherRule2 };

OtherRule = x:Number "," y:Number;
OtherRule = "(" y:Number "," z:Number ")";

It will be preprocessed to be equivalent to the following:

ComplexRule = x:Number "," y:Number { y:Number "," z:Number };

OtherRule = x:Number "," y:Number;
OtherRule = "(" y:Number "," z:Number ")";

Note how the same field names are used. I think "Dead" rules will not be automatically eliminated from the generated code, but instead I'll just disable the dead code warning for the generated part.