dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

No ParseTree nodes for 'external' rules when grammar composition is used #149

Closed dmakarov closed 9 years ago

dmakarov commented 9 years ago

Perhaps, this is by design, but it's confusing to use grammar composition. For example:

module expeg;

import std.stdio;

import pegged.grammar;

mixin(grammar(q{
 Term:

 Identifier       <~ [a-zA-Z_] [a-zA-Z0-9_]*
 FloatLiteral     <~ Sign? Integer "." Integer? ([eE] Sign? Integer)?
 IntegerLiteral   <~ Sign? Integer
 Integer          <~ digit+
 Sign             <- "-" / "+"
}));

mixin(grammar(q{
 Expr:

 Expression       <  Factor AddExpr*
 Factor           <  Primary MulExpr*
 AddExpr          <  [-+] Factor
 MulExpr          <  [*/] Primary
 Primary          <  Term.Identifier
                   / Term.FloatLiteral
                   / Term.IntegerLiteral
                   / '(' Expression ')'
}));

int
main(string[] args)
{
  auto t = Expr("asdf + 3.4 * qwer");
  writeln(t);

  return 0;
}

produces the output

Target pegged 0.1.0 is up to date. Use --force to rebuild.
Building expeg ~master configuration "application", build type debug.
Running ldc2...
Running ./bin/expeg
Expr [0, 17]["asdf", "+", "3.4", "*", "qwer"]
 +-Expr.Expression [0, 17]["asdf", "+", "3.4", "*", "qwer"]
    +-Expr.Factor [0, 5]["asdf"]
    |  +-Expr.Primary [0, 5]["asdf"]
    +-Expr.AddExpr [5, 17]["+", "3.4", "*", "qwer"]
       +-Expr.Factor [7, 17]["3.4", "*", "qwer"]
          +-Expr.Primary [7, 11]["3.4"]
          +-Expr.MulExpr [11, 17]["*", "qwer"]
             +-Expr.Primary [13, 17]["qwer"]

I would expect that every Expr.Primary node would have a child like Term.Identifier or Term.FloatLiteral. The nodes Expr.Primary have no children. Is this how grammar composition designed to work, i.e. no subtrees for any rules included from other grammars? It seems like a serious limitation.

PhilippeSigaud commented 9 years ago

Wait, why did you close this issue?

On Tue, Feb 24, 2015 at 5:21 PM, Dmitri Makarov notifications@github.com wrote:

Closed #149 https://github.com/PhilippeSigaud/Pegged/issues/149.

— Reply to this email directly or view it on GitHub https://github.com/PhilippeSigaud/Pegged/issues/149#event-241027143.

dmakarov commented 9 years ago

I closed the issue because I realized that the truncated tree for grammars composed of included subgrammars is due to Tree Decimation, which cuts off the nodes coming from external rules. Although it would be nicer to include the nodes from the external rules but not the internal PEGGED nodes, I guess this is not a bug, so I closed the issue. The reason I would want to use grammar composition is to split a larger grammar to submodules, and then have the option of parsing fragments of source code, that are not parsable by the parser generated from the entire composed grammar. However, the latter can be done by calling directly the parsers for specific rules of the entire grammar. And that doesn't have the problem of truncating the tree at the boundary of nodes coming from the external rules.