dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Utility function to get error location #59

Closed alexrp closed 12 years ago

alexrp commented 12 years ago

It would be nice to have a utility function to get the location of a fatal failure during parsing, at least as a temporary measure until we work out a proper error handling mechanism.

(This might exist already and I just can't find it...)

PhilippeSigaud commented 12 years ago

Yeah, something that calculate the line and column, but only for failures. I cut away code for calculating this for all parse results, but doing it for failures is a good idea. OK, I can add that.

PhilippeSigaud commented 12 years ago

I pushed a commit with a position function. Its documented in the code, not in the .md files. I also changed the way parse trees are printed to give the failure location.

It's still primitive and inefficient, in that if when a leaves fails, all its ancestors in the tree will recalculate their position from the beginning. It should build upon its children failure.

alexrp commented 12 years ago

The error reporting is actually looking pretty good. It's slightly off, though. Given a source like

use foo::bar

record foo {
}

if the grammar specifies rec and not record, Pegged will say expected bar, but got record foo, but should say expected rec but got record ....

Should I try to make a reduced test case?

PhilippeSigaud commented 12 years ago

On Mon, Oct 8, 2012 at 1:22 PM, Alex Rønne Petersen < notifications@github.com> wrote:

The error reporting is actually looking pretty good. It's slightly off, though. Given a source like

use foo::bar

record foo { }

if the grammar specifies rec and not record, Pegged will say expected bar, but got record foo, but should say expected rec but got record ....

Should I try to make a reduced test case?

Yes please.

alexrp commented 12 years ago
$ cat test2.d
module test;

import std.stdio;
import pegged.grammar;

mixin(grammar(`Test:

               program <- decl* :eoi
               useDecl < "use" qualifiedIdentifier
               recDecl < "record" identifier "{" "}"
               decl < useDecl/recDecl`));

void main()
{
    writeln(Test(`use foo.bar

                  rec foo {
                  }`));
}
$ rdmd test2.d
Test (failure)
 +-Test.program failure at line 2, col 18, after "...          " expected foo.bar, but got "rec foo {
..."
    +-Test.decl [0, 31]["use", "foo.bar"]
       +-Test.useDecl [0, 31]["use", "foo.bar"]

It should say "expected record ..., but got rec" or something like that.

PhilippeSigaud commented 12 years ago

OK, I think I got it.

Btw, the error message is "expected end of input, but got rec foo" (because in this case, you have one valid decl and then a failure, so decl* succeeds and it's :eoi which fails.

To see the error message you wanted, use this (where I changed decl* to decl decl to force 2 decls in a row):

import std.stdio;
import pegged.grammar;

mixin(grammar(`Test:

               program <- decl decl :eoi  # Here, artificial change
               useDecl < "use" qualifiedIdentifier
               recDecl < "record" identifier "{" "}"
               decl < useDecl/recDecl`));

void main()
{
    writeln(Test(`use foo.bar

                  rec foo {
                  }`));
}
alexrp commented 12 years ago

Looks good. One tiny suggestion for improvement (though I'm not sure if it's generic enough to be in Pegged): When printing the "but got ..." text, it may be worth stopping once a newline is hit but only if some text has already been printed; if not, just skip past all white space and rinse, repeat. Does that make sense?

PhilippeSigaud commented 12 years ago

Yes, that makes sense.

PhilippeSigaud commented 12 years ago

std.string.strip seems to do the trick. I just added it and push it to Github.

Note that the error reporting is not perfect yet (it should say expected "record" or "use"). A nasty bug by Val (issue #60) forced me to change the way rules are named and gave me the occasion to build a real naming infrastructure. I think I can build on that to get a real expected(rule) function. That will take a few days.

Then, I can go back to unit-testing the hell out of pegged.peg and pegged.grammar. And then, I can finally code that left-recursion code I want to add.

alexrp commented 12 years ago

Sounds good. Thanks!

(Closing this; I'll open a new issue for other possible error handling improvements.)