Hejsil / mecha

A parser combinator library for Zig
MIT License
413 stars 18 forks source link

Returning progress when parsing fails #41

Open zenMaya opened 1 year ago

zenMaya commented 1 year ago

There is a use case for this library, where the parser should report, where it failed. Since that would allow for some "graceful" failing. Instead of crashing, there is some merit in reporting where. It would suffice to report the position/rest, as everything else can be calculated if that happens, but now I think there is no way to report progress.

This would essentially mean moving the Error inside the ParserResult, which isn't ideal and is an architecture change. I understand if such modification is out of scope, and I need to implement it myself.

Hejsil commented 1 year ago

If we can come up with something that is decently ergonomic, then I wouldn't mind the change. All my libs will not have a 1.0 release before zig 1.0 so let's just break things while we can.

zenMaya commented 1 year ago

I think the change would require moving out of Error!Result(T) into something like:

pub fn Result(comptime T: type) type {
    return struct {
        pub const Value = union(enum) {
            Ok: T,
            Err: Error, // using the pub const Error
        };

        value: Value,
        rest: []const u8 = "",
    };
}

But that would mean rewriting all combinators and their respective functions

zenMaya commented 1 year ago

maybe it would be even better to support in what state the parser left, as there are some recovery strategies that can be implemented, to report as many errors as possible

for example: c strings are always one line:

char* str = "aaaaa
";

is invalid but most parsers would assume that the string was just improperly terminated, although the input is invalid, the parser would continue assuming that the string is valid:

char* str = "aaaaa"
";";

erroring with unterminated string x2 and missing semicolon x2

or even some smarter error recovery strategy. Like having a special rule for multiline strings that fails as it is not valid in the C grammar, but continuing with parsing as the parser can still find more errors. Failing at the first error and not knowing what the error is: "ParserError", is not sufficient for recovery.

If there could be either a "panic mode" (discard characters until a "safe character" is found, like }, end of a block)

or fail with continuation, so the implementor could modify the state to get it into a safe state. For example if string literal fails, continue by inserting a ghost ", then continue with parsing.

zenMaya commented 1 year ago

I have starting to test the "continuation" tactic.

The Result structure would look like this:

pub fn Result(comptime T: type) type {
    return struct {
        pub const Value = union(enum) {
            Ok: T,
            InternalError: Error, // Parser failed irrecoverably
            ParserError: Parser(T), // Parser failed at this state
        };
        value: Value,
        rest: []const u8 = "",
    };
}

and for example the string combinator would look like this (I had to remove the comptime since the parsing must be able to dynamically recreate the parser):

pub fn string(str: []const u8) Parser(void) {
    return struct {
        fn func(_: mem.Allocator, s: []const u8) Void {
            const diff = mem.indexOfDiff(u8, s, str); // find the first characters where input differs
            if (diff) |diff_index| { // if there is one
                if (diff_index == str.len) // if the first difference is the ending of str, the parse was successful
                    return Void{ .value = .{ .Ok = {} }, .rest = s[str.len..] };
                // else create a new string, that expects the rest of the string to be parsed
                // for example: if s = "strang" and str = "string"
                // => string("ing") and rest = "ang"
                const resume_string = string(str[diff_index..]);
                return Void{ .value = .{ .ParserError = resume_string }, .rest = s[diff_index..] };
            } else return Void{ .value = .{ .Ok = {} }, .rest = s[str.len..] };
        }
    }.func;
}

@Hejsil please let me know what you think! Does it seem interesting to you? Should I continue with the implementation?

PS. I noticed that this may not be possible, I am not that versed in every zig implementation detail, and string not beeing a comptime function might make it not function.

Hejsil commented 1 year ago

PS. I noticed that this may not be possible, I am not that versed in every zig implementation detail, and string not beeing a comptime function might make it not function.

const resume_string = string(str[diff_index..]);

Yes correct, this will not work. I'm also not sure I understand why we would want ParserError to have Parser(T) as the result. How would one use this? I think you need to show me how you imagine error recovery looking.

zenMaya commented 1 year ago

Yes correct, this will not work. I'm also not sure I understand why we would want ParserError to have Parser(T) as the result. How would one use this? I think you need to show me how you imagine error recovery looking.

Okay maybe I haven't thought this through properly, sorry. I have been thinking about this for a few hours, and parser combinators aren't really suitable for what I had in mind. It would work for LL(1) parsers, but you cannot really inspect combinators like you can inspect Finite State Machines with explicit transition tables. The opt function essentially does almost everything that you need for error recovery I think, maybe some more ergonomic wrapper for collecting errors, but that doesn't require drastic API changes.

I'm really sorry that I have bothered you with this, I should've thought about it much longer before posting this.

Maybe the original Ok/Err proposal would be still useful? At least if you don't want to modify your combinators to recover from errors using opt, you can at least report the syntax error position.

Hejsil commented 1 year ago

Maybe the original Ok/Err proposal would be still useful? At least if you don't want to modify your combinators to recover from errors using opt, you can at least report the syntax error position.

Yea, I think there is a usecase for the Ok/Err thing. Being able to see how far the parser got furthest can be quite useful for simple error reporting, but for things more complicated, I think that is outside this libraries scope.

zenMaya commented 1 year ago

Yea, I think there is a usecase for the Ok/Err thing. Being able to see how far the parser got furthest can be quite useful for simple error reporting,

Okay, I'll try to get it done soon'ish and make a PR!

but for things more complicated, I think that is outside this libraries scope.

You are right, I think there could be some combinators for opt and error message, or combine opt with map, but otherwise the library is already featurefull enough. Once again I'm sorry for all the other stuff I suggested, it was dumb.