EoinDavey / tsPEG

PEG Parser Generator for TypeScript
Mozilla Public License 2.0
192 stars 7 forks source link

$EOF Error #18

Closed SparkFountain closed 3 years ago

SparkFountain commented 3 years ago

I expanded my grammar, which comprises ~220 lines of rules right now. When I try to parse an expression, which was matched before the grammar was expanded, I get the following error:

expmatches: []
exprules: ["$EOF"]
pos: {overallPos: 0, line: 1, offset: 0}

How can I debug this $EOF error (what is the "end of file" in this context)?

Help is very much appreciated :) Thanks in advance.

EoinDavey commented 3 years ago

Hi!

This has brought to light that I have not documented the $EOF error, I thought I had done that, so apologies there.

The $EOF error is returned when the parser parsers some of the input file correctly, but does not make it to the end of the input..

Here's a quick example. I define a grammar that only accepts the string "Hello World" and nothing else. If I pass the string "Hello World" then it matches correctly, however, if I pass "Hello World and Mars" then the parse will match the "Hello World" at the start, but will not be able to make it to the end of the file, so it returns the special "$EOF" error.

<21:37>eoin@eoin-pc:/tmp/tspegtst$ cat tst.peg 
hello_world := 'Hello World'
<21:37>eoin@eoin-pc:/tmp/tspegtst$ tspeg tst.peg out.ts
<21:37>eoin@eoin-pc:/tmp/tspegtst$ tsc -t es2016 -m commonjs out.ts 
<21:37>eoin@eoin-pc:/tmp/tspegtst$ node
Welcome to Node.js v13.14.0.
Type ".help" for more information.
> const parse = require("./out.js").parse;
undefined
> parse("Hello World");
ParseResult { ast: 'Hello World', err: null }
> parse("Hello World and Mars");
ParseResult {
  ast: 'Hello World',
  err: SyntaxErr {
    pos: { overallPos: 11, line: 1, offset: 11 },
    exprules: [ '$EOF' ],
    expmatches: []
  }
}
>

If this info isn't enough to solve the problem then if you provide the input string and the grammar that caused this error I can have a look and see if I can diagnose why this is happening.

SparkFountain commented 3 years ago

Thank you for the explanation. Most probably, the error can be found in my grammar. I have a suspicion, maybe you can give me a hint how to solve this.

In my grammar, I have rules for command parsing, which are "nested" over several rules. After each terminal symbol, I append an optional new rule, which is the "root rule" again. My intention is to allow for parsing several lines of code, beginning with the root entry point after one valid line of code is parsed.

// Root
ROOT                := COMMAND

// Command
COMMAND             := GRAPHICS_2D_COMMAND

GRAPHICS_2D_COMMAND := GRAPHICS_2D_GRAPHICS_COMMAND

GRAPHICS_2D_GRAPHICS_COMMAND := Cls
                              | ClsColor
                              | Color
                              | Line
                              | Oval
                              | Rect

Cls                 := '[cC][lL][sS]\s*' ROOT?
ClsColor            := '[cC][lL][sS][cC][oO][lL][oO][rR]\s+' red=NUMBER ',\s*' green=NUMBER ',\s*' blue=NUMBER ',\s*' ROOT?
Color               := '[cC][oO][lL][oO][rR]\s+' red=NUMBER ',\s*' green=NUMBER ',\s*' blue=NUMBER ',\s*' ROOT?
Line                := '[lL][iI][nN][eE]\s+' beginX=NUMBER ',\s*' beginY=NUMBER ',\s*' endX=NUMBER ',\s*' endY=NUMBER ',\s*' ROOT?
Oval                := '[oO][vV][aA][lL]\s+' x=NUMBER ',\s*' y=NUMBER ',\s?' width=NUMBER ',\s*' height=NUMBER ',\s*' ROOT?
Rect                := '[rR][eE][cC][tT]\s+' x=NUMBER ',\s*' y=NUMBER ',\s*' width=NUMBER ',\s*' height=NUMBER ',\s*' ROOT?

Is it possible that the ROOT? rule at the end of each command rule triggers the "$EOF" error?

(Oh, and by the way, is there a more elegant way to parse case-insensitive strings other than explicitly denote each letter in upper and lower case? :))

EoinDavey commented 3 years ago

I don't think the ROOT? rule should be (directly) causing this issue. What could help is if you provide the string that you tried to parse that raised the $EOF error. I am concerned that the $EOF error raised marked its position as line: 1, offset 0. Which implied it didn't match any characters.

(I don't know of any better way no, sorry!)