Open oovm opened 2 years ago
I would rather implement left-recursion support (with a directive at the recursion point). It's not that hard anyway, since memoization is already implemented.
(EDIT: I realized that a left-recursive grammar and a precedence climber would produce different parse trees. Also most things that would need left-recursion is already solved by the repetition/closure operator, as seen in the operator example)
Also I'm not really sure how the output of a precedence climber would look like in the AST.
Something like this?
enum Expression {
Number(String),
InfixExpression(InfixExpression),
// could be also prefix, postfix, bracket, etc.
}
struct InfixExpression {
left_side: Box<Expression>,
operator: Operator,
left_side: Box<Expression>,
}
enum Operator {
Add,
Sub,
Mul,
Div,
}
I'm also not really sure how this could be represented in the grammar in a readable way, so that it remains really intuitive what kind of AST will be generated.
I have a question, how should Sequence be handled?
Is Sequence an atomic operation or an infix operation?
I think at least self can be introduced for judging direct left recursion.
@precedence
arithmetic =
= x:self op:"+" y:self
| x:self op:"-" y:self
| x:self op:"*" y:self
| x:self op:"/" y:self
| x:self op:"^" y:self
| x:number
| "(" x:self ")"
;
This syntax does not changed, but cannot represent priority groups
(like rust-peg).
However, the meaning of priority groups
needs to be discussed, and it may be better to form a strict partial order of priorities.
EDIT: I thought about it on the way home, priority groups really don't make sense.
Because you can always define AddLike = '+' | '-'
, to achieve the same effect
Another question is what is the name of the variant in this case? There is no way to tag.
ANTLR uses #
to mark branches, the usage might be
@precedence
arithmetic =
@Add| x:self op:"+" y:self
@Sub| x:self op:"-" y:self
@Mul| x:self op:"*" y:self
@Div| x:self op:"/" y:self
@Pow| x:self op:"^" y:self
@Num| x:number
@Gro| "(" x:self ")"
;
Here I introduced an operator like @VAR_NAME|
, It looks weird, but the name of the enum can be determined by analogy with the @:Rule
and @NAME:Rule
operator.
I think we can think of @
as a macro that acts on |
Another problem is the ternary operator and the sequence operator.
@precedence
arithmetic =
| x:self op1:"?" y:self op2:":" z:self
| x:self y:self
;
The writing is consistent but logically unreasonable, and special treatment is required for this writing.
Is Sequence an atomic operation or an infix operation?
I'd consider Sequence
a special case of infix where there is no actual operator token.
but cannot represent priority groups (like rust-peg).
Since it's such a special syntax, it might as well copy most of the rust-peg syntax with some modification so it looks more like the rest of the syntax. Something like this maybe (with a ton of restrictions, e.g. only specific forms and string literals are allowed):
@precedence
Expression = (
Add = (@) "+" @;
Sub = (@) "-" @;
--
Mul = (@) "*" @;
Div = (@) "/" @;
--
Power = @ "^" (@);
--
Number = @:NUMBER;
Group = "(" @ ")";
);
ternary operator
Apparently that's a solvable problem.
Then again, at that point it might be better to create a separate library (or use an existing one) that does precedence climbing simply and reliably, and then just pulling that in with @extern
Again, with left recursion you can have the following grammar:
Expression =
left:*Expression (op:Add | op:Sub) right:*Expression |
left:*Expression (op:Mul | op:Div) right:*Expression |
left:*Expression op:Power right:*Expression |
"(" atom:*Expression ")" |
atom:Number ;
Add = "+";
Sub = "-";
Mul= "*";
Div = "/";
Power = "^";
Resulting types:
struct Expression {
left: Box<Expression>,
right: Box<Expression>,
atom: Box<Expression_atom>,
op: Expression_op,
}
enum Expression_atom {
Expression(Box<Expression>),
Number(Number),
}
enum Expression_op {
Add(Add), Sub(Sub), ...
}
I think it would be just as easily readable as the precedence climbing proposals above. And it's way more general too, you can fit the ?: and array indexing in nicely.
P.S.: CPython's own parser is also written this way: https://medium.com/@gvanrossum_83706/peg-at-the-core-developer-sprint-8b23677b91e6#4a74 https://github.com/we-like-parsers/pegen/blob/main/data/python.gram#L125
I've been playing with left recursion this afternoon, and it's very difficult to get right. See a left-recursive calculator here: https://github.com/badicsalex/peginator/blob/eade5ae60f777e101df173ac95b5509ec235e55f/peginator_test/src/calculator_example/grammar.ebnf
The main problem is that you cannot use Expression
as the left side of multiplications, it must be a Term
, so you need to have a Term
type that only allows other multiplications and things above it in precedence (e.g. power, or groups).
To make it a tiny bit prettier, I tried using >Term
instead of @:Term
in Expression
, but that completely broke the left recursion cache handling (going through Expression
and Term
have different cache entries). This is why I put a warning in the readme.
But, on the bright side, even though types look horrible when used as literals, the code itself can be very compact, non-redundant and all around pretty.
Operation precedence, and operation associativity, are very special grammar rules.
Widely used in programming languages, it is worth using a special syntax.
The operation priority of the programming language has more than ten levels, if such an expression tree is used.
It is not very convenient to nested match the required level, which is inconsistent with the ADT model of rust.
Yeah, you convinced me. I'll definitely implement this when I have the time. Right now a bunch of other projects have more priority.
PRs are welcome too, of course :)
I wonder if there are plans to support precedence climbing like https://docs.rs/peg/latest/peg/#precedence-climbing
I know that peg means never left recursive, but this way of writing does have higher readability.
For traditional peg, this is equivalent to a two-phase parsing
stage-1 attempt to capture
stage-2 to reorganization
Vec<atom|prefix|infix|postfix>
using extern progress, e.g. the pratt parser.The second stage can even be dynamically adjusted by the user(like swift/haskell), and static parsing (parser-embedded) is sufficient for most languages.