dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

why dose my grammar compile but crash on run #154

Open Computermatronic opened 9 years ago

Computermatronic commented 9 years ago

i wrote the following grammar and generate it with asModule Metacode: Script < Statement* Operator < "+" / "-" / "_" / "\" Unary < "++" / "--" / "+" / "-" UnaryPostFix < "++" /"--" Identifier < ~([a-z_A-Z]) ~([a-zA-Z0-9]) Variable < Identifier / "(" Expression ")" Tinary < Expression "?" Expression ":" Expression ArrayInitializer < "[" (Expression ",")* "]" Expression < Unary Variable / Variable UnaryPostFix / Variable / Expression Operator Expression / Tinary / Lambda / Expression "." Identifier / Expression "[" Expression "]" / ArrayInitializer Def < "def" Type? Identifier ("=" Expression)? ";" Statement < (Expression ";" / If / For / While / Do / Foreach / Class / Function / Def ";" / Import ";" / Module ";") Block < "{" Statement* "}" / Statement If < "if" "(" Expression ")" Block For < "for" "(" Def ";" Expression ";" Expression ")" Block While < "while" "(" Expression ")" Block Do < "do" Block "until" "(" Expression ")" Foreach < "foreach" "(" (Variable ",")* Variable ";" Expression ")" Block Lambda < "function" Type? "(" (Identifier ",")* Identifier? ")" Block Function < "function" Type? Identifier "(" (Identifier ",")* Identifier? ")" Block Import < "import" Identifier ("." Identifier)* Module < "module" Identifier ("." Identifier)* Type < "<" Identifier ">"

oop stuff (currently unused)

ProtectionModifiers < "public"
                    / "package"
                    / "protected"
                    / "private" 
Class <  ProtectionModifiers? "class" Identifier ("using" (Identifier ",")* Identifier)? ClassBody 
ClassBody < "{" ClassMember* "}" 
ClassMember < ProtectionModifiers? ClassModifiers? (Def / Function) 
ClassModifiers < "static" 
AbstractClass <  ProtectionModifiers? "abstract" "class" Identifier ("using" Identifier)* AbstractClassBody 
AbstractClassBody < "{" (AbstractClassMember / ClassMember)* "}" 
AbstractClassMember < ProtectionModifiers? ClassModifiers? "abstract" "function" Type? Identifier "(" (Identifier ",")* Identifier? ")" ";" 
Interface < ProtectionModifiers "interface" Identifier InterfaceBody 
InterfaceBody < "{" InterfaceMember* "}" 
InterfaceMember < ProtectionModifiers? InterfaceModifiers? "function" Type? Identifier "(" (Identifier ",")* Identifier? ")" ";" 
InterfaceModifiers < "static" 

it compiles, but when i try to use it with module main; import metacode.parser; import std.stdio;

string test = `

class test { function f() { } } `;

int main(string[] args) { ParseTree p = Metacode(test); foreach(c;p.children) { writeln(c.name," ",c.children); foreach(d;c.children) { writeln(d.name); } } return 0; } and it crashes. why?

PhilippeSigaud commented 9 years ago

Hi,

mainly the crash is due to left-recursive rules: Expression < Expression Operator Expression, for example. Parsing Expression Grammars do not handle left-recursive rules, as explained in the docs. You should rewrite your rules for Expression and Tinary, so as not to have subrules beginning with Expression, else the parser enters an infinite loop (which causes the segmentation fault). As a first approach, using # to comment the left-recursive subrules in Expression (including Tinary) makes the example work.

There are two other small mistakes:

Also, be aware that defining Script as Statement* allows the parsing of 0 characters as a valid parse. Hence not well-written modules will be parsed as "" (the 0-Statement option). You should maybe define Script as Statement* endOfInput to catch any mistake.

Cheers,

Philippe

PhilippeSigaud commented 8 years ago

Hi machinistprogrammer,

Bastiaan Veelo recently add left-recursion capability to Pegged. If you're still interested in it, you could try your grammar anew with the current HEAD.