dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Can't fix this failure. #304

Closed enjoysmath closed 3 years ago

enjoysmath commented 3 years ago

I've attached the entrie project abstract_spacecraft.zip. However there is only two files. The other file term.d simply holds the D data structures that I'm "parsing into". Ideally these structures 1-1 correspond with the constructs of the grammar.

Here's app.d It really is as simple as can be:

import pegged.grammar;
import std.stdio;
import std.conv;

import term;

mixin(grammar(
`
_:
Typed          < List ":" Term
Term           < Template
List           < Term ("," Term)*
Template       < TemplatePart+
TemplatePart   < Variable / Number / Text
Number         < ~([0-9]+)
Text           < (!(Variable / Number / ":" / ","))*
Variable       <- identifier
`));

/// Example interpreter
// TODO: later return "Maths"

Term parseTreeToMaths(ParseTree p)
{
   switch (p.name)
   {
      case "_":
         return parseTreeToMaths(p.children[0]);
      case "_.Variable":
         return new Variable(to!string(p.children[0]));
      case "_.Typed":
         auto term = parseTreeToMaths(p.children[0]);
         auto type = parseTreeToMaths(p.children[1]);
         return new Typed(term, type);
      case "_.List":
         Term[] list;
         foreach (term; p.children)
            list ~= parseTreeToMaths(term);
         return new List(list);
      default:
         return null;
   }
}

void main()
{
   while (true) {
      writeln("s=");
      auto s = readln();          
      auto p = _(s);      
      writeln(p);
      auto result = parseTreeToMaths(p);      
      writeln(result);
   }
   //// Parsing at compile-time:
   //enum parseTree1 = AbstractSpacecraft("1 + 2 - (3*x-5)*6");
   //
   //pragma(msg, parseTree1.matches);
   //assert(parseTree1.matches == ["1", "+", "2", "-",
   //                              "(", "3", "*", "x", "-", "5", ")", "*", "6"]);
   //writeln(parseTree1);
   //
   //// And at runtime too:
   //auto parseTree2 = AbstractSpacecraft(" 0 + 123 - 456 ");
   //assert(parseTree2.matches == ["0", "+", "123", "-", "456"]);

   //readln();
}

The errors I'm getting are:

s=
a+b:A
_ (failure)
 +-_.Typed (failure)
    +-_.List[0, 1]["a"]
    |  +-_.Term[0, 1]["a"]
    |     +-_.Template[0, 1]["a"]
    |        +-_.TemplatePart[0, 1]["a"]
    |           +-_.Variable[0, 1]["a"]
    +-literal!(":") Failure at line 0, col 1, after "a" expected "\":\"", but got "+b:A\n"

whenever you enter in something more complex that a:A, say: a+b:A. It will fail on that and I can't figure out why. I've tried playing around with the negative lookahead the !... part of the grammar, to no avail.

Since my grammar shouldn't be more that about 5-10x that size, I will probably end up making a hand-rolled parser with simple rules such as only one colon per line, etc.

Let me know if we can fix this so that I can continue using pegged.

abstract_spacecraft.zip

veelo commented 3 years ago

Hi, for situations like this I wrote the tracing capability of Pegged, see https://github.com/PhilippeSigaud/Pegged/wiki/Grammar-Debugging.

But let me try to parse your input with your grammar manually:

I assume you expect "+" to be matched by Text, and it looks like you just forgot the "match anything parser" . in

Text           < (!(Variable / Number / ":" / ",") . )*

Hope this helps, Bastiaan.

enjoysmath commented 3 years ago

Hi, for situations like this I wrote the tracing capability of Pegged, see https://github.com/PhilippeSigaud/Pegged/wiki/Grammar-Debugging.

But let me try to parse your input with your grammar manually:

  • Typed tries to parse a+b:A, invokes List.

    • List tries to parse a+b:A, invokes Template.

    • Template tries to parse a+b:A, invokes TemplatePart.

      • TemplatePart tries to parse a+b:A, invokes Variable.

      • Variable tries to parse a+b:A, invokes identifier.

        • identifier tries to parse a+b:A, matches a. Remaining input: +b:A
      • TemplatePart matches a. Remaining input: +b:A

    • Template tries to parse +b:A, invokes another TemplatePart.

      • TemplatePart tries to parse +b:A, invokes Variable.

      • Variable tries to parse +b:A, invokes identifier.

        • identifier tries to parse +b:A, fails.
      • TemplatePart tries to parse +b:A, invokes Number.

      • Number tries to parse +b:A, fails.

      • TemplatePart tries to parse +b:A, invokes Text.

      • Text tries to parse +b:A. Text cannot match anything but the empty string because it only contains a negative lookahead, repeated 0 or more times. It succeeds on the empty string. Remaining input: +b:A.

    • Template has not advanced the input, gives up on trying another TemplatePart.

    • List succeeds with a. Remaining input: +b:A.

  • Typed expects literal ":" after List, but the remaining input is +b:A, and fails.

I assume you expect "+" to be matched by Text, and it looks like you just forgot the "match anything parser" . in

Text           < (!(Variable / Number / ":" / ",") . )*

Hope this helps, Bastiaan.

@veelo Yes, that fixed it! :)