EoinDavey / tsPEG

PEG Parser Generator for TypeScript
Mozilla Public License 2.0
195 stars 7 forks source link

Backtracking fails to find valid parse #53

Closed dylhunn closed 7 months ago

dylhunn commented 7 months ago

Consider the following grammar:

file := statement*$

statement := declaration | expression

declaration := identifier _ ':=' _ expression

expression := literal | binary | identifier

identifier := '[_a-zA-Z][_a-zA-Z0-9]*'

binary := literal _ { '\+' | '-' | '/' | '\*' } _ literal

literal := '[0-9]+' | '\'.*\''

// optional whitespace
_  := '[ \t\r\n]*'

// mandatory whitespace
__ := '[ \t\r\n]+'

Trying to parse the expression 24 - 444 fails with Syntax Error at line 1:2. Expected one of '[_a-zA-Z][_a-zA-Z0-9]*', '[0-9]+', '\'.*\'', EOF in array.. However, rewriting the expression rule to expression := binary | identifier | literal causes the parse to succeed.

Shouldn't the parser backtrack to find the valid parse?

dylhunn commented 7 months ago

It's of course possible that I'm "holding it wrong," and every rule eagerly matches left-to-right. But this seemed wrong; I thought rules were supposed to be order-insensitive.

EoinDavey commented 7 months ago

Hi Dylan,

Unfortunately rules are order sensitive as you suspect. The backtracking will not try to match a rule which has already succeeded. i.e. when it attempts to match expression it will try literal first, which does match. It then considers the expression rule "matched" at this position in the string and won't try again here with binary or identifier.

This should be made more clear in the documentation, apologies for that.

In general if you have rules A and B where A can match a prefix of a string that B would match, I recommend putting B before A in a disjunction (B | A).