Open cdcarter opened 1 year ago
Yea, this is a quite common problem when writing parsers called Left Recursion. You basically have the rule:
expr = binOp
binOp = expr + expr
So what happens when you try parsing 1 + 1
?
Well:
expr -> binOp -> expr + expr
|
+> expr -> binOp -> expr + expr
|
+> expr -> binOp -> expr + expr
|
+ ...
To solve this, you could change the rules to be something like this:
expr = binOp
binOp = ( number + )* number // A list of `number +` followed by a `number`
This can be written in mecha
with the mecha.many
function, but the problem is when you need to have multiple operators. I can think of a few none elegant solution in my head right now, but I would have to play around to see what would be the best way to handle that.
On a side note, I'm pretty sure other parser combinator libraries have functions for this specific pattern and mecha probably should have that too.
Well when you point it out, it seems obvious!
I've got something that's pretty much working out of
const integer = mecha.combine(.{ mecha.int(u16, .{ .parse_sign = false }), ws });
const plus = mecha.string("+");
const minus = mecha.string("-");
const star = mecha.string("*");
const slash = mecha.string("/");
const operator = mecha.oneOf(.{ plus, minus, star, slash }).convert(mecha.toEnum(Expression.Op));
const base_expression = mecha.oneOf(.{integer.convert(toExpression)}); // will eventually have identifier too
const binOp = mecha.combine(.{
base_expression,
operator,
mecha.ref(expressionRef),
}).map(mecha.toStruct(Expression.BinOp)).convert(toExpression);
fn expressionRef() mecha.Parser(*Expression) {
return expression;
}
const expression = mecha.oneOf(.{ binOp, base_expression });
On a side note, I'm pretty sure other parser combinator libraries have functions for this specific pattern and mecha probably should have that too.
Indeed. I am pretty new to combinator/PEG style parsing but I notice that e.g. pyparsing
has a helper called infix_notation
that directly builds up an operator precedence table. That's the sort of thing I'd like to evolve this into, over time.
I also got a version of this grammar working, stolen from wikipedia
// Expr ← Sum
// Sum ← Product (('+' / '-') Product)*
// Product ← Power (('*' / '/') Power)*
// Power ← Value ('^' Power)?
// Value ← [0-9]+ / '(' Expr ')'
I haven't figured out how to convert this one into an AST (... yet!, I'll get there!) instead of just discarding, but it works quite well.
I'm trying to write a very simple arithmetic expression parser. I'm not even trying to have any specific operator precedence yet, just going left to right. I'm ending up in an infinite loop, and I suspect it's because I'm misunderstanding ref.
Here's the data structure we're trying to parse into.
and a conversion function ...
here's the parser definition(s)
A simple usage end in an infinite recursion
and so on, for many tens of thousands of frames...
I'm simply unsure what to do next to troubleshoot. any advice appreciated.