parser: enable custom error handling

mewmew commented 8 years ago

While implementing a simple compiler for a subset of C, we wanted to add user-friendly error messages, and therefore started looking for ways of integrating them into the Gocc generated parser.

As Gocc has support for error production rules, this was the obvious mechanism to hook into. However, the errors reported were verbose and while helpful to a compiler developer, they were not user-friendly for end-users of the compiler.

To mitigate this issue, we implemented a way of creating custom error messages, using the builtin error interface of Go, to keep the current error messages as is, while enabling extensions (through type assertions).

An example using the updated code is provided below, using uparse to parse the incorrect input file testdata/incorrect/parser/pe01.c .

Before

uparse from commit 5f744176565bb4a797828b4ce981fbd16903af40

u@x1 ~/D/g/s/g/m/uc> uparse testdata/incorrect/parser/pe01.c 
Parsing "testdata/incorrect/parser/pe01.c"
2016/04/13 00:38:44 main.parseFile (uparse.go:85): error: Error in S157: )(6,)), Pos(offset=102, line=0, column=0)102: expected ["ident" "(" "int_lit" "-" "!"], got ")"

After

uparse from commit mewmew/uc@db976f881abc76f9806d71fca1eb3b2083627c12

u@x1 ~/D/g/s/g/m/uc> uparse testdata/incorrect/parser/pe01.c 
Parsing "testdata/incorrect/parser/pe01.c"
2016/04/13 00:39:37 main.parseFile (uparse.go:88): error: 102: expected ["ident" "(" "int_lit" "-" "!"], got ")"

awalterschulze commented 8 years ago

The offset seems be the only accurate number. So we could probably use that to calculate the line and column by running through the input again. This is if newError has access to the original input buffer?

mewmew commented 8 years ago

Hi Walter,

Thanks for the quick reply!

Responding to your comment here.

Seems like the stacktop is still printed out here? How does it get eliminated in the final output?

The default error message is exactly the same as before, and personally I think it should be to facilitate compiler development. The main contribution of this pull request is that the data used to produce the error message is now exposed to the user through the *errors.Error type, on which a user may do a type assertion to provide a custom error message.

An example of this is provided by the uparse tool.

From uc/cmd/uparse/parse.go:

    file, err := p.Parse(s)
    if err != nil {
        if err, ok := err.(*errors.Error); ok {
            // Unwrap Gocc error.
            return parser.NewError(err)
        }
        return errutil.Err(err)
    }

Where parser.NewError is defined in uc/gocc/parser/error.go

// NewError returns a user-friendly parse error.
func NewError(err *errors.Error) error {
    // TODO: Add line:col positional tracking.
    var expected []string
    for _, tok := range err.ExpectedTokens {
        if tok == "error" {
            // Remove "error" production rule from the set of expected tokens.
            continue
        }
        expected = append(expected, tok)
    }
    sort.Strings(expected)
    return fmt.Errorf("%d: unexpected %q, expected %q", err.ErrorToken.Pos.Offset, string(err.ErrorToken.Lit), expected)
}

The main point is that, using the error interface of Go to implement Gocc errors, enables users to gain access to the data from the *errors.Error type, instead of only having access to the error message as a string (which was the case before this pull request). Then users of Gocc may implement custom error handling, as fit for their purposes, while the default Gocc error messages stays useful to compiler developers.

I think this is the sweet-spot.

What are your thoughts?

The offset seems be the only accurate number. So we could probably use that to calculate the line and column by running through the input again. This is if newError has access to the original input buffer?

This should definitely be possible, but not strictly related to the pull request. I think we should continue the line:col discussion in issue #14.

Cheers

Robin and Alex

awalterschulze commented 8 years ago

Perfect :)

awalterschulze commented 8 years ago

I think you should merge it :)

mewmew commented 8 years ago

I think you should merge it :)

Sure :)

awalterschulze commented 8 years ago

Sorry it took my so long to understand what you were trying to achieve here, thats my bad. This is really cool :)

mewmew commented 8 years ago

Sorry it took my so long to understand what you were trying to achieve here, thats my bad. This is really cool :)

No worries at all! I feel happy the solution turned out to be very light-weight in terms of restructuring of the Gocc code base, and fitted in naturally with how errors were already handed in Gocc.

Have a great day!

Cheerful regards from a sunny Sweden :)

goccmack / gocc

parser: enable custom error handling #18

Before

After