badicsalex / peginator

PEG parser generator for creating ASTs in Rust
MIT License
34 stars 3 forks source link

Confusing error message in case of a syntax error #19

Open ethindp opened 1 year ago

ethindp commented 1 year ago

I'm very confused. The peginator-cli binary is giving me this error:

Error: expected end of input
--> ada.peg:6801:1
 |  
 |  
 |  ^

But my grammar certainly isn't done on this line (the number of lines are primarily because I generated the unicode character classes that I needed to match.... Probably not the "best" way I could do it, but it (should) work.) Anyway, it's getting stuck on this:

@string
@check(crate::checks::check_identifier)
Identifier =
IdentifierStart {IdentifierStart | IdentifierExtend} !(Abort | Abs | Abstract | Accept | Access | Aliased | All | And | Array | At | Begin | Body | Case | Constant | Declare | Delay | Delta | Digits | Do | Else | Elsif | End | Entry | Exception | Exit | For | Function | Generic | Goto | If | In | Interface | Is | Limited | Loop | Mod | New | Not | Null | Of | Or | Others | Out | Overriding | Package | Parallel | Pragma | Private | Procedure | Protected | Raise | Range | Record | Rem | Renames | Requeue | Return | Reverse | Select | Separate | Some | Subtype | Synchronized | Tagged | Task | Terminate | Then | Type | Until | Use | When | While | With | Xor);

@char
IdentifierStart =UppercaseLetter
| LowercaseLetter
| TitlecaseLetter
| ModifierLetter
| OtherLetter
| LetterNumber;

@char
IdentifierExtend =NonspacingMark
| SpacingMark
| DecimalNumber
| ConnectorPunctuation;

NumericLiteral = @:DecimalLiteral | @:BasedLiteral;
# Dies where this comment is
@skip_no_ws
@check(crate::checks::check_decimal_literal)
DecimalLiteral = integer_part:Numeral ["." fractional_part:Numeral] [exponent:Exponent];

@string
@skip_no_ws
Numeral = Digit {["_"] Digit};

@string
@skip_no_ws
Exponent = i'E' [Add] Numeral | i'E' Subtract Numeral;

@char
Digit = '0' .. '9';

@check(crate::checks::check_based_literal)
BasedLiteral =
base:Base "#" integer_part:BasedNumeral ["." fractional_part:BasedNumeral] "#" [exponent:Exponent];

@string
@skip_no_ws
Base = Numeral;

@string
@skip_no_ws
BasedNumeral =
ExtendedDigit {["_"] ExtendedDigit};

@char
ExtendedDigit = Digit | i'A' .. i'F';

At a glance I can't see any kind of syntax error, so I'm quite confused. Does order of the rules matter? (I hope not...)

badicsalex commented 1 year ago

The order of the rules don't matter. The error is very misleading though, the problem is that the directive is called @no_skip_ws, and not @skip_no_ws

ethindp commented 1 year ago

@badicsalex Post-processing on directives and rejecting unrecognized directives and saying "Unknown directive x" would be better.

ethindp commented 1 year ago

Also, the error message when misusing @ overrides doesn't give you any line information. E.g.: I get Error: Enum '@:' fields have to be used exactly once in all choice branches, and must not be used in closures or optional parts, but this tells me nothing about where the error is, just that it exists. Would it be possible to include error information with these?

badicsalex commented 1 year ago

Notes to myself (or anyone who's willing to fix this):

This error is caused by the structure of the peginator grammar, namely the first rule:

Grammar = {(rules:Rule | rules:CharRule | rules:ExternRule) ";"} $ ;

Parsing the rule list stops at the first weird directive, but is successful otherwise. Then comes the EOI check, and that throws an error. Unfortunately the EOI check happens at the same place as the first parsing failure (the position of the "@", since the directive check strings contain the @ already).

If there are multiple furthest errors, all of them should be stored. But this should be opt-in, as this will mean a lot of allocations, and keep in mind that ParseState is cloned a lot, so this would severely affect parsing speed. Maybe a Cow<Vec<>> (or an optimized small vec crate with cow semantics) would be appropriate.