datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.05k stars 98 forks source link

[Question] Backtrack/IsPartial how and why? #118

Closed AndrewSav closed 3 years ago

AndrewSav commented 4 years ago

I would like to get a bit more understanding in how Backtrack and IsPartial works. I'll explain what I understand so far, and then I ask question.

Backtrack

IsPartial

Or, Many and IgnoreMany combinators All of them check condition where HasValue is false, and Backtrack is false and IsPartial is true, which means that the previous parser failed, does not have the backtrack, but has moved the current position.

Usage Practically, you will use the above in code in form of A.Try().Or(B), A.Try().Many(B) and A.Try().IgnoreMany(B)

Example: Many If we only have parser that does not consume any input on failure we do not need backtrack:

var parser = Character.Digit.Many();
//Parse "1" => '1'
//Parse "12" => '1','2'
//Parse "12a" => '1','2'
//Parse "." => unexpected `.`, expected digit or letter.

But once we started combining parsers, we inevitable end up with ones that fail after consuming partial input:

var parser = 
    from first in Character.Digit
    from second in Character.Letter
    select new string(new[] { first, second });
//Parse "1a" => "1a"
//Parse "12" => unexpected `2`, expected letter.

At this point 1 will be consumed and 2 will be not. So:

var parser2 = parser.Many()
//Parse "1a2b3c" => "1a","2b","3c"
//Parse "2a!y" => "2a"
//Parse "2a45" => unexpected `4`, expected letter.

In the last example, we have a partial parse of "45" and without the backtrack it fails. Let's try the same with backtrack:

var parser3 = parser.Try().Many()
//Parse "2a45" => "2a"

In this case, Many has parsed as much as it could and then stopped just before 45.

Example: Or Similarly, parser that do not consume any input on failure work work the same with or without backtrack, so I won't repeat it here. Here are a couple of parsers that do consume input on failure:

var parser1 = Span.EqualTo("keyword").Select(s => s.ToStringValue());
var parser2 =
    from first in Character.Letter
    from second in Character.Digit
    select new string(new[] { first, second });

var parser3 = parser1.Or(parser2);
//Parse "k3" => unexpected `3`, expected `e`.

Here we are not using Backtrack and k3 cannot be matched because the first parser started consuming characters, and "k" is consumed.

Question From the examples above it looks to me like Backtracking is a sensible default. I cannot imagine an example where I would like Many or Or to fail, if the previous parser consumed some input. In most cases, I would simply want it to stop and return whatever is consumed. Note, that even without Backtracking, Many will not fail if it cannot match at all, it will just return empty output.

Why does it makes sense to insist on using Try, and not making backtracking the default mode? Why does it makes sense for a failed parser to leave the position mid-parse? In which use cases the behaviour of consuming some input for a parser when failing could be desirable?

nblumhardt commented 4 years ago

Hi @AndrewSav - apologies for the slightly rushed answer, but hopefully some clues:

Backtracking in the presence of alternation means that if the parser takes the first branch, regardless of how well that branch matched, it will discard error information and report a failure in the second branch. This makes producing good error messages hard, especially if alternatives have similar possible prefixes.

This article describes try() and the trade-offs around backtracking, IIRC: https://lorgonblog.wordpress.com/2007/12/06/monadic-parser-combinators-part-four/

HTH!