In Parsita 1, the pred parser discards all errors from a successful run of the inner parser if the predicate turns out to be false. (The predicate failure is the only failure.) This is not how Parsita promises to work. It promises to always report the furthest error, which is usually what the user wants. This matters when one parser parses a prefix of another parser (basically all parsers of mathematical expressions). The short parser succeeds, the long parser fails. This is normally fine because, when the overall parser fails to consume the full input, the long parser's message is displayed because it made more progress before the ultimate failure.
Below is a parser (clearly a subset of a full expression parser) that demostrates this unexpected behavior.
from parsita import *
def is_static(expression):
if isinstance(expression, str):
return False
if isinstance(expression, int):
return True
elif isinstance(expression, list):
return all(is_static(e) for e in expression)
class PredParsers(ParserContext):
name = reg(r"[a-zA-Z]+")
integer = reg(r"[0-9]+") > int
function = name >> "(" >> repsep(expression, ",") << ")"
expression = function | name | integer
static_expression = pred(expression, is_static, "static expression")
print(PredParsers.static_expression.parse("1"))
# Success(1)
print(PredParsers.static_expression.parse("sin(0)"))
# Success([0])
print(PredParsers.static_expression.parse("a"))
# Expected static expression but found 'a'
# Line 1, character 1
# a
# ^
print(PredParsers.static_expression.parse("f(a,1)"))
# Expected static expression but found 'f'
# Line 1, character 1
# f(a,1)
# ^
print(PredParsers.static_expression.parse("f(2,1)"))
# Success([2, 1])
print(PredParsers.static_expression.parse("f(2,1"))
# Expected static expression but found 'f'
# Line 1, character 1
# f(2,1
# ^
This last error message shows the problem. The farther error is clearly the issue, but the early success takes precedence. This is easy to fix in Parsita 1; it's just a missing .merge(status):
class PredicateParser(Generic[Input, Output], Parser[Input, Input]):
def __init__(self, parser: Parser[Input, Output], predicate: Callable[[Output], bool], description: str):
super().__init__()
self.parser = parser
self.predicate = predicate
self.description = description
def consume(self, reader: Reader[Input]):
remainder = reader
status = self.parser.consume(remainder)
if isinstance(status, Continue):
if self.predicate(status.value):
return status
else:
return Backtrack(remainder, lambda: self.description)#.merge(status)
else:
return status
Sure enough, this makes pred report the farthest error.
print(PredParsers.static_expression.parse("f(2,1"))
# Expected ',' or ')' but found end of source
However, this now breaks the pred parser error reporting. Because the pred parser fails at the start of the input, the inner parser always makes more progress than pred so any way that the parser could be extended becomes the farther error. The pred parser error message is actually hard to trigger and, in fact, cannot be triggered by static_expressions in the example at all.
print(PredParsers.static_expression.parse("a"))
# Expected '(' but found end of source
This was discovered because Parsita 2 fixes the original bug (because merges no longer need to be manually applied) and I was relying on pred reporting an error at least sometimes. The expected behavior of pred could be this:
If the inner parser fails, it backtracks like normal.
If the predicate passes, it continues like normal.
If the predicate fails, the failure is reported at the end of the consumed input, not the beginning. Note that this completely messes up the concept of "actual" for the predicate parser unless we add the ability to customize the actual message on a per-parser basis.
There is a question here on whether or not failures that tie the predicate parser should be superseded by the predicate parser. For example,
print(PredParsers.static_expression.parse("a"))
# Expected '(' or static expression but found end of source # allowing ties
# Expected static expression but found end of source # superseding ties
For this particular parser, superseding ties is clearly better, but it is difficult to convince myself that superseding ties is always the correct behavior for pred. This would be the only parser that supersedes other failures. While unusual, it kind of makes sense given that pred applies a post-processing step and therefore acts as a kind of checkpoint on the parser.
In the same vein, there is also a question of whether or not the predicate parser should clear any errors generated by the inner parser at the end of the input it consumed.
Ultimately, the pred parser may not be completely sane because it is not really a parser, but a post-processor of parsed input. I have only used the pred parser where I should probably be using a type checker on top of the AST, but that was too much work.
In Parsita 1, the
pred
parser discards all errors from a successful run of the inner parser if the predicate turns out to be false. (The predicate failure is the only failure.) This is not how Parsita promises to work. It promises to always report the furthest error, which is usually what the user wants. This matters when one parser parses a prefix of another parser (basically all parsers of mathematical expressions). The short parser succeeds, the long parser fails. This is normally fine because, when the overall parser fails to consume the full input, the long parser's message is displayed because it made more progress before the ultimate failure.Below is a parser (clearly a subset of a full expression parser) that demostrates this unexpected behavior.
This last error message shows the problem. The farther error is clearly the issue, but the early success takes precedence. This is easy to fix in Parsita 1; it's just a missing
.merge(status)
:Sure enough, this makes
pred
report the farthest error.However, this now breaks the
pred
parser error reporting. Because the pred parser fails at the start of the input, the inner parser always makes more progress thanpred
so any way that the parser could be extended becomes the farther error. The pred parser error message is actually hard to trigger and, in fact, cannot be triggered bystatic_expressions
in the example at all.This was discovered because Parsita 2 fixes the original bug (because
merge
s no longer need to be manually applied) and I was relying onpred
reporting an error at least sometimes. The expected behavior ofpred
could be this:There is a question here on whether or not failures that tie the predicate parser should be superseded by the predicate parser. For example,
For this particular parser, superseding ties is clearly better, but it is difficult to convince myself that superseding ties is always the correct behavior for
pred
. This would be the only parser that supersedes other failures. While unusual, it kind of makes sense given thatpred
applies a post-processing step and therefore acts as a kind of checkpoint on the parser.In the same vein, there is also a question of whether or not the predicate parser should clear any errors generated by the inner parser at the end of the input it consumed.
Ultimately, the
pred
parser may not be completely sane because it is not really a parser, but a post-processor of parsed input. I have only used thepred
parser where I should probably be using a type checker on top of the AST, but that was too much work.