PhilippeSigaud / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

v0.4.9: Invalid grammar errors are not reported #335

Open ethindp opened 1 year ago

ethindp commented 1 year ago

So, since resolving #333, everything works fine, but no parse tree is reported. (I don't know if my grammar is valid; no errors are reported at all.) I tried the grammar debugging article (building with the tracer version for the pegged dependency) and that didn't print anything. (The code in the grammar debugging article also seems to be wrong, since setTraceConditionFunction takes a delegate with two arguments, not one).

veelo commented 1 year ago

Is there a way I can have a look at your Grammar?

ethindp commented 1 year ago

@veelo Sure, I've uploaded it. (It contains a few rules for Unicode because I needed them and didn't know if they were predefined anywhere, and the docs aren't clear on whether I could match Unicode general categories, so the grammar doesn't start until around line 3237.) Sorry about that -- if there's a better way of handling that, I'd happily do it. :) app.txt

veelo commented 1 year ago

I finally found some time to reproduce this. Given this Ada source

with Text_IO; use Text_IO;
procedure hello is
begin
   Put_Line("Hello world!");
end hello;

the parse tree generated by Pegged with your grammar is

Ada[0, 0][]
 +-Ada.Root[0, 0][]

So the grammar succeeds while consuming none of the input. Looking at your grammar, this is because of the following reduction:

Root <  Compilation
Compilation <  (CompilationUnit)*

The latter meaning 0 or more, so it happily accepts 0 and calls it a day.

You can force Root to consume all input by changing that rule to

Root <  Compilation eoi

With this modification, the parser now produces the failing parse tree

Ada (failure)
 +-Ada.Root (failure)
    +-Ada.Compilation (failure)
       +-zeroOrMore!(wrapAround!(Ada.Spacing, wrapAround!(Ada.Spacing, Ada.CompilationUnit, Ada.Spacing), Ada.Spacing))[0, 0][]
       +-Ada.CompilationUnit (failure)
          +-Ada.ContextClause[0, 27]["With", "Text_IO", ";", "Use", "Text_IO", ";"]
          |  +-Ada.ContextItem[0, 14]["With", "Text_IO", ";"]
          |  |  +-Ada.WithClause[0, 14]["With", "Text_IO", ";"]
          |  |     +-Ada.NonlimitedWithClause[0, 14]["With", "Text_IO", ";"]
          |  |        +-Ada.KwWith[0, 5]["With"]
          |  |        +-Ada.LibraryUnitName[5, 12]["Text_IO"]
          |  |        |  +-Ada.Name[5, 12]["Text_IO"]
          |  |        |     +-Ada.DirectName[5, 12]["Text_IO"]
          |  |        |        +-Ada.Identifier[5, 12]["Text_IO"]
          |  |        +-Ada.Semicolon[12, 14][";"]
          |  +-Ada.ContextItem[14, 27]["Use", "Text_IO", ";"]
          |     +-Ada.UseClause[14, 27]["Use", "Text_IO", ";"]
          |        +-Ada.UsePackageClause[14, 27]["Use", "Text_IO", ";"]
          |           +-Ada.KwUse[14, 18]["Use"]
          |           +-Ada.PackageName[18, 25]["Text_IO"]
          |           |  +-Ada.Name[18, 25]["Text_IO"]
          |           |     +-Ada.DirectName[18, 25]["Text_IO"]
          |           |        +-Ada.Identifier[18, 25]["Text_IO"]
          |           +-Ada.Semicolon[25, 27][";"]
          +-Ada.LibraryItem (failure)
             +-option!(wrapAround!(Ada.Spacing, wrapAround!(Ada.Spacing, Ada.KwPrivate, Ada.Spacing), Ada.Spacing))[27, 27][]
             +-Ada.LibraryUnitDeclaration (failure)
                +-option!(wrapAround!(Ada.Spacing, wrapAround!(Ada.Spacing, Ada.OverridingIndicator, Ada.Spacing), Ada.Spacing))[27, 27][]
                +-Ada.SubprogramSpecification (failure)
                   +-Ada.ProcedureSpecification (failure)
                      +-Ada.KwProcedure[27, 37]["Procedure"]
                      +-Ada.DefiningProgramUnitName (failure)
                      |  +-Ada.ParentUnitName[37, 79]["helloisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]
                      |  |  +-Ada.Name[37, 79]["helloisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]
                      |  |     +-Ada.IndexedComponent[37, 79]["helloisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]
                      |  |        +-Ada.Prefix[37, 63]["helloisbeginPut_Line"]
                      |  |        |  +-Ada.Name[37, 63]["helloisbeginPut_Line"]
                      |  |        |     +-Ada.DirectName[37, 63]["helloisbeginPut_Line"]
                      |  |        |        +-Ada.Identifier[37, 63]["helloisbeginPut_Line"]
                      |  |        +-Ada.LeftParenthesis[63, 64]["("]
                      |  |        +-Ada.Expression[64, 78]["\"", "Helloworld!", "\""]
                      |  |        |  +-Ada.Relation[64, 78]["\"", "Helloworld!", "\""]
                      |  |        |     +-Ada.SimpleExpression[64, 78]["\"", "Helloworld!", "\""]
                      |  |        |        +-Ada.Term[64, 78]["\"", "Helloworld!", "\""]
                      |  |        |           +-Ada.Factor[64, 78]["\"", "Helloworld!", "\""]
                      |  |        |              +-Ada.Primary[64, 78]["\"", "Helloworld!", "\""]
                      |  |        |                 +-Ada.StringLiteral[64, 78]["\"", "Helloworld!", "\""]
                      |  |        +-Ada.RightParenthesis[78, 79][")"]
                      |  +-Ada.Dot (failure)
                      |  |  +-literal!(".") Failure at line 3, col 27, after "o world!\")" expected "\".\"", but got ";\nend hell"
                      |  +-Ada.DefiningIdentifier[37, 63]["helloisbeginPut_Line"]
                      |     +-Ada.Identifier[37, 63]["helloisbeginPut_Line"]
                      +-Ada.ParameterProfile[63, 63][]

I haven't studied what is the problem from this point on, but it looks like whitespace is ignored.

I hope this will help you further. See also the Extended Pascal example, which uses asModule as well.

ethindp commented 1 year ago

@veelo Thank you, I didn't know eoi was an actual rule! I'm unsure why it's failing; I still can't figure out why tracing isn't working (maybe the documentation needs to be improved on that.... It says to use the tracer subconfiguration, and I do that, but std.experimental.logger/std.logger is broken).

ethindp commented 1 year ago

Also, one last question: which extended token (colon, semicolon, ...) do I need to use to effectively say "don't create an AST node for this entire node, but instead replace this node with it's children"? I.e., in the tree you posted above, you'll notice a (lot) of unnecessary stuff I want to get rid of, but there's also stuff I want to keep. Take the context item, for instance: that's a limited with clause or a nonlimited with clause. I don't need to know that a context item exists in the tree; I just need to get access to the with clauses themselves and I can ignore that, hey, there's context items, because that's obvious. Same for a basic declaration: I could care less that there's a basic declaration; I want to know about the children of that node only. The fact that it's a "basic declaration" is a bit immaterial and just creates clutter. (I might end up keeping a lot of this anyway though, I'm unsure what I'll need to get rid of and what I need to keep.)

ethindp commented 1 year ago

@veelo Okay, I think I know exactly why it's failing to match. My identifier rule is defined as fusing everything it matches. For some reason, pegged is taking that to mean "eat everything". Or something like that is going on. If you look at the parse tree:

                      +-Ada.DefiningProgramUnitName (failure)
                      |  +-Ada.ParentUnitName[48, 88]["mainisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]

You'll notice that it's stripped out every bit of whitespace and just gobbled up the quotes, the strings, everything, and just merged it into one element. I don't know if my delimiters are messed up or what. I really need to get tracing working...

veelo commented 1 year ago

I am looking at the tracing issue. It appears std.logger has suffered a bit of a regression. https://forum.dlang.org/post/khdhqryatrqjtxegovgf@forum.dlang.org.

veelo commented 1 year ago

I just updated the wiki with updated tracing instructions.

https://github.com/PhilippeSigaud/Pegged/wiki/Grammar-Debugging/_compare/0eeb5286b4f3946b4c33df1790cbb97fbcd75b84

ethindp commented 1 year ago

Thanks, will try again and get back to you.

ethindp commented 1 year ago

So tracing is working but it really isn't all that helpful. But I think I know the problem: I'm misusing the | operator where I should be using /. It's difficult to figure out for this particular grammar when | is preferable over /, though.

veelo commented 1 year ago

Use of | instead of / will increase time complexity, but it shouldn't lead to incorrect results.

veelo commented 1 year ago

Do you see any matches of Separator in your trace?

ethindp commented 1 year ago

I'm not sure what's causing this then. Here's what it's doing:

             +-Ada.LibraryUnitDeclaration (failure)
                +-Ada.SubprogramDeclaration (failure)
                   +-option!(wrapAround!(Ada.Spacing, wrapAround!(Ada.Spacing, Ada.OverridingIndicator, Ada.Spacing), Ada.Spacing))[38, 38][]
                   +-Ada.SubprogramSpecification (failure)
                      +-Ada.ProcedureSpecification (failure)
                         +-Ada.KwProcedure[38, 48]["Procedure"]
                         +-Ada.DefiningProgramUnitName (failure)
                         |  +-Ada.ParentUnitName[48, 88]["mainisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]
                         |  |  +-Ada.Name[48, 88]["mainisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]
                         |  |     +-Ada.IndexedComponent[48, 88]["mainisbeginPut_Line", "(", "\"", "Helloworld!", "\"", ")"]
                         |  |        +-Ada.Prefix[48, 72]["mainisbeginPut_Line"]
                         |  |        |  +-Ada.Name[48, 72]["mainisbeginPut_Line"]
                         |  |        |     +-Ada.DirectName[48, 72]["mainisbeginPut_Line"]
                         |  |        |        +-Ada.Identifier[48, 72]["mainisbeginPut_Line"]
                         |  |        +-Ada.LeftParenthesis[72, 73]["("]
                         |  |        +-Ada.Expression[73, 87]["\"", "Helloworld!", "\""]
                         |  |        |  +-Ada.Relation[73, 87]["\"", "Helloworld!", "\""]
                         |  |        |     +-Ada.SimpleExpression[73, 87]["\"", "Helloworld!", "\""]
                         |  |        |        +-Ada.Term[73, 87]["\"", "Helloworld!", "\""]
                         |  |        |           +-Ada.Factor[73, 87]["\"", "Helloworld!", "\""]
                         |  |        |              +-Ada.Primary[73, 87]["\"", "Helloworld!", "\""]
                         |  |        |                 +-Ada.StringLiteral[73, 87]["\"", "Helloworld!", "\""]
                         |  |        +-Ada.RightParenthesis[87, 88][")"]
                         |  +-Ada.Dot (failure)
                         |  |  +-literal!(".") Failure at line 4, col 24, after "o world!\")" expected "\".\"", but got ";\r\nend mai"
                         |  +-Ada.DefiningIdentifier[48, 72]["mainisbeginPut_Line"]
                         |     +-Ada.Identifier[48, 72]["mainisbeginPut_Line"]
                         +-Ada.ParameterProfile[72, 72][]

You'll notice that, for whatever reason, it's devouring everything. First WS is stripped, and then the entire string is just consumed, leaving nothing left to parse, and an invalid input. I've defined keywords as:

KwAbort <  "Abort"i
KwAbs <  "Abs"i
KwAbstract <  "Abstract"i
KwAccept <  "Accept"i
KwAccess <  "Access"i
KwAliased <  "Aliased"i
KwAll <  "All"i
KwAnd <  "And"i
KwArray <  "Array"i
KwAt <  "At"i
# ...

Same for delimiters. Pretty much all the rules, minus a couple, use the space < operator over <- because whitespace is insignificant the majority of the time, other than as, well, a separator. But that doesn't mean it should be behaving like this. I define my identifier rule as:

Identifier <~
           IdentifierStart (IdentifierStart | IdentifierExtend)*
           !(
               KwAbort
                / KwAbs
                / KwAbstract
                / KwAccept
                / KwAccess
                / KwAliased
                / KwAll
                / KwAnd
                / KwArray
                / KwAt
                / KwBegin
                / KwBody
                / KwCase
                / KwConstant
                / KwDeclare
                / KwDelay
                / KwDelta
                / KwDigits
                / KwDo
                / KwElse
                / KwElsif
                / KwEnd
                / KwEntry
                / KwException
                / KwExit
                / KwFor
                / KwFunction
                / KwGeneric
                / KwGoto
                / KwIf
                / KwIn
                / KwInterface
                / KwIs
                / KwLimited
                / KwLoop
                / KwMod
                / KwNew
                / KwNot
                / KwNull
                / KwOf
                / KwOr
                / KwOthers
                / KwOut
                / KwOverriding
                / KwPackage
                / KwParallel
                / KwPragma
                / KwPrivate
                / KwProcedure
                / KwProtected
                / KwRaise
                / KwRange
                / KwRecord
                / KwRem
                / KwRenames
                / KwRequeue
                / KwReturn
                / KwReverse
                / KwSelect
                / KwSeparate
                / KwSome
                / KwSubtype
                / KwSynchronized
                / KwTagged
                / KwTask
                / KwTerminate
                / KwThen
                / KwType
                / KwUntil
                / KwUse
                / KwWhen
                / KwWhile
                / KwWith
                / KwXor
            )
IdentifierStart < 
             UppercaseLetter
             / LowercaseLetter
             / TitlecaseLetter
             / ModifierLetter
             / OtherLetter
             / LetterNumber
IdentifierExtend < 
             NonspacingMark
             / SpacingMark
             / DecimalNumber
             / ConnectorPunctuation

But still, I don't see how that should be causing this issue. The / and | distinction is important for the name rule; if I use / for that rule, it parses with Ada in with Ada.Text_IO;, but then stops, expecting a ;, and not allowing for a . to be present. I'm most definitely confused.

ethindp commented 1 year ago

I see lines like this, if your talking about this:

2023-09-05T13:31:56.416 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3| (l:1, c:1, i:0)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "with Ada.Text_I"...

I'm not sure what an actual "match" would look like.

ethindp commented 1 year ago

Update: yes, separator matches:

2023-09-05T13:31:56.419 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1698:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:6, i:5) "Ada.Separator" SUCCEEDED on " "
veelo commented 1 year ago

Good, so it recognises the space. Then look above and below that line in the trace why it continues to consume input for the identifier. It should not.

ethindp commented 1 year ago

That was for the with clause.... For the procedure part, it's doing something a bit strange:

2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:35, i:34)   "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\r\n\r\nprocedure m"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:35, i:34)   "Ada.Separator" FAILED on "\r\n\r\nprocedure m"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:35)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\n\r\nprocedure ma"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:35)    "Ada.Separator" FAILED on "\n\r\nprocedure ma"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:36)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\r\nprocedure mai"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:36)    "Ada.Separator" FAILED on "\r\nprocedure mai"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:37)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\nprocedure main"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:37)    "Ada.Separator" FAILED on "\nprocedure main"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "Ada.Separator" FAILED on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "Ada.Separator" FAILED on "procedure main "...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:10, i:47) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on " main is\r\nbegin"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1698:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "Ada.Separator" SUCCEEDED on " "
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "Ada.Separator" FAILED on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "Ada.Separator" FAILED on "main is\r\nbegin\r"...

It looks like it's just scanning further and further ahead, consuming more and more input, because it's failing to match. I did define the separator rule as:

Separator <- '\U00000020'
   / '\U000000a0'
   / '\U00001680'
   / [\U00002000-\U0000200a]
   / [\U00002028-\U00002029]
   / '\U0000202f'
   / '\U0000205f'
   / '\U00003000'

But I used <- because I didn't want to create an infinite loop between Separator and Spacing. I've no idea how the parser would work if I changed it to use the space-consuming arrow.

ethindp commented 1 year ago

I just tried debugging it in gdb and... Well, GDB didn't like all the template code. Got stuck trying to step into the main rule. Just sat there wasting CPU cycles until I killed it.

veelo commented 1 year ago

The difference between < and <- rules is just whether you'll see the whitespace in the parse tree, right? Or do I misremember.

The trace looks weird indeed. That first line should succeed, and consume the '\r'. You are the first one that I see to use UTF32 in their grammar, though, so there may be an issue there, and I don't recognise all those code points.

veelo commented 1 year ago
Spacing <- (Separator / [\r\n\f\t\v])*

should be

Spacing <- (Separator / [\r\n\f\t\v])+

no?

ethindp commented 1 year ago

I don't really know why it'd cause problems. I can try to read the file as UTF-32, but still....

Those code points are:

You can find the full list here (for space separators), here (for line separators), and here (for paragraph separators).

ethindp commented 1 year ago

And I'm unsure.... I think the tutorial uses *, but I can try + instead.

Update: just tried and it completely broke.... Now it expects spaces. So * is correct.

ethindp commented 1 year ago

Okay, I know the problem. This rule:

Spacing <- (Separator / [\r\n\f\t\v])*

Gets translated into:

    static TParseTree Spacing(string s)
    {
        if(__ctfe)
        {
            return         pegged.peg.defined!(pegged.peg.zeroOrMore!(pegged.peg.or!(Separator, pegged.peg.or!(pegged.peg.literal!("\r"), pegged.peg.literal!("\n"), pegged.peg.literal!(`\`), pegged.peg.literal!("f"), pegged.peg.literal!("\t"), pegged.peg.literal!(`\`), pegged.peg.literal!("v")))), "Ada.Spacing")(TParseTree("", false,[], s));
        }
        else
        {
            forgetMemo();
            return hooked!(pegged.peg.defined!(pegged.peg.zeroOrMore!(pegged.peg.or!(Separator, pegged.peg.or!(pegged.peg.literal!("\r"), pegged.peg.literal!("\n"), pegged.peg.literal!(`\`), pegged.peg.literal!("f"), pegged.peg.literal!("\t"), pegged.peg.literal!(`\`), pegged.peg.literal!("v")))), "Ada.Spacing"), "Spacing")(TParseTree("", false,[], s));
        }
    }
    static string Spacing(GetName g)
    {
        return "Ada.Spacing";
    }

This can be worked around pretty easily, but in general I think that the parser should pass-through any and all escape sequences. Otherwise you get weird things like this. It makes me wonder what else is getting screwed up in this manner.

veelo commented 1 year ago

Are you talking about \f and \v becoming "\\f" and "\\v"? That seems like a bug.

ethindp commented 1 year ago

No, it took \f and \v and thought I meant the character sequences \ or f or \ or v. (Changing it to the code points didn't solve it....) The new trace is this.... This just gets weirder and weirder:

2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:35, i:34)   "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\r\n\r\nprocedure m"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:35, i:34)   "Ada.Separator" FAILED on "\r\n\r\nprocedure m"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:35)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\n\r\nprocedure ma"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:35)    "Ada.Separator" FAILED on "\n\r\nprocedure ma"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:36)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\r\nprocedure mai"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:36)    "Ada.Separator" FAILED on "\r\nprocedure mai"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:37)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\nprocedure main"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:37)    "Ada.Separator" FAILED on "\nprocedure main"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "Ada.Separator" FAILED on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)    "Ada.Separator" FAILED on "procedure main "...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:10, i:47) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on " main is\r\nbegin"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1698:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "Ada.Separator" SUCCEEDED on " "
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "Ada.Separator" FAILED on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48) "Ada.Separator" FAILED on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1324:and 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15| (l:3, c:11, i:48)   "and!(IdentifierStart, zeroOrMore, negLookahead)" considering rule "Ada.IdentifierStart" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18| (l:3, c:12, i:49)   "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "ain is\r\nbegin\r\n"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18| (l:3, c:12, i:49)   "Ada.Separator" FAILED on "ain is\r\nbegin\r\n"...

It's like it doesn't even try matching the others from what I can tell. It just tries separator and moves on if that dies.

ethindp commented 1 year ago

Here's the latest trace data.... Frankly I'm stumped at this point. And gdb can't debug the generated parser for some strange reason.... It gets stuck when trying to step into the function. (Search for procedure main and the errors start to happen around that line.) TraceLog.txt

veelo commented 1 year ago

One last suggestion before I log off for the day. If you think some input should be parsed by particular rules in the grammar but it is not, isolate those rules in a new grammar and feed it only that input. Then it may be easier to see why it doesn't. And if it does, then other rules must be the problem, and you can gradually extend the grammar and see when it breaks.

Good luck!

veelo commented 1 year ago

No, it took \f and \v and thought I meant the character sequences \ or f or \ or v. (Changing it to the code points didn't solve it....) The new trace is this.... This just gets weirder and weirder:

2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:35, i:34) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\r\n\r\nprocedure m"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:1, c:35, i:34) "Ada.Separator" FAILED on "\r\n\r\nprocedure m"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:35)  "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\n\r\nprocedure ma"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:35)  "Ada.Separator" FAILED on "\n\r\nprocedure ma"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:36)  "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\r\nprocedure mai"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:2, c:1, i:36)  "Ada.Separator" FAILED on "\r\nprocedure mai"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:37)  "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "\nprocedure main"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:37)  "Ada.Separator" FAILED on "\nprocedure main"...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)  "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)  "Ada.Separator" FAILED on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)  "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "procedure main "...
2023-09-05T13:31:56.484 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10| (l:3, c:1, i:38)  "Ada.Separator" FAILED on "procedure main "...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:10, i:47)   "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on " main is\r\nbegin"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1698:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48)   "Ada.Separator" SUCCEEDED on " "
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48)   "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48)   "Ada.Separator" FAILED on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48)   "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12| (l:3, c:11, i:48)   "Ada.Separator" FAILED on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1324:and 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15| (l:3, c:11, i:48) "and!(IdentifierStart, zeroOrMore, negLookahead)" considering rule "Ada.IdentifierStart" on "main is\r\nbegin\r"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1683:or 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18| (l:3, c:12, i:49) "or!(Ada.Separator, or!(literal!(\"\r\"), literal!(\"\n\"), literal!(\"\\\"), literal!(\"f\"), literal!(\"\t\"), literal!(\"\\\"), literal!(\"v\")))" considering rule "Ada.Separator" on "ain is\r\nbegin\r\n"...
2023-09-05T13:31:56.486 [trace] C:\Users\ethin\source\pegged\pegged\peg.d:1708:or 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18| (l:3, c:12, i:49) "Ada.Separator" FAILED on "ain is\r\nbegin\r\n"...

It's like it doesn't even try matching the others from what I can tell. It just tries separator and moves on if that dies.

On second look I think this is OK. The tracer seems to skip built in parsers, such as literal. The tracer reports Separator to fail, but then literal!(\"\r\") succeeds (hidden) and \r is consumed (shown). I should try to improve the tracer.

ethindp commented 1 year ago

Yeah it's difficult to understand what it's doing.