dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
533 stars 66 forks source link

keep track of the longest failed match to improve error messages #269

Closed skoppe closed 4 years ago

skoppe commented 4 years ago

Here I am introducing the failEnd and failedChild members of the ParseTree.

Currently pegged discards the longest failed match in the option?, the zeroOrMore* or the oneOrMore+ operators.

Let me illustrate:

Thing <- (A B)* C
A <- "abc"
B <- "def"
C <- "ghi"

Let us run this grammer on the following input: "abc". It will fail of course and the error message will be expected "ghi" but found end of line. This is silly because A did actually correctly match (except it is discarded) and the error ought to be expected "def" but found end of line.

This also occurs with the (A B)? or the (A B)+ (after the first match).

Essentially the option, zeroOrMore and the oneOrMore rules eat errors, and that can be problematic if during those rules it actually parsed the longest failing match.

There are also more complex causes where some nested rule is correctly parsed, but it could contain a longer failed match, so it is not just about rules that are not successful. E.g.

Thing <- AB C
AB <- A B?
A <- "abc"
B <- "def"
C <- "ghi"

With input "abc de" it will complain about expected "ghi" but got "de". Of course rule B? is matched longer, and it should complain about that.

This PR introduces additional logic to keep track of the longest failed match.

It is not in a perfect stage (it still needs tests), but I wanted to submit it already to get some feedback.

skoppe commented 4 years ago

@veelo Thanks for taking the time to review. I have addressed your points. I will add some unittests (hopefully later today).

skoppe commented 4 years ago

I just found out that the fixup interferes with the memoization. No solution as of yet.

veelo commented 4 years ago

Right, that can be tricky. Good you discovered that.

Philippe just made me a repository collaborator, so I’ll be able to merge once the PR is ready.

skoppe commented 4 years ago

Great. I think I have found the culprit but will first test on our codebase. If this gets merged I suggest to make a beta tag and after some testing at least a minor version bump. The ParseTree will be different under invalid input, and people's current error handling might not expect that.

veelo commented 4 years ago

Are you ready for me to tag a beta release with this?

skoppe commented 4 years ago

Yes, please do, last couple of weeks everything was stable here.

veelo commented 4 years ago

@skoppe Any idea what could be the cause of the spike in compiler memory consumption? See #272.