dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

How do you make Pegged parse *everything*. Right now its parsing only first "chunk". #263

Closed enjoysmath closed 4 years ago

enjoysmath commented 5 years ago
module MyGraphDatabase;

import std.stdio;
import pegged.grammar;

// Example useage:
/*
search for
} U {
.X f↪ .Y  
g↶    h↶   
.Z x↪ .W,
}
where:
   i is mono
   i matches labels {'\text{Hom}(x,y)', 'ξ \in'}
   f is {'\..}
*/

//We need this grammar in order to put things in a standard form before searching
// For instance taking spaces out 
mixin(grammar(`
   BananaCatsQueryLanguage:
   Query                   < BlockDiagram (:"\n\n" BlockDiagram)*
   BlockDiagram            <- SelfMap / LineOfObjects ("\n" LineOfArrows "\n" LineOfObjects)*
   LineOfObjects           <- Obj (:" " ArrowLR :" " Obj)*
   LineOfArrows            <- ArrowUD (:" " ArrowDiag :" " ArrowUD)*
   ArrowLR                 <- EpiArrLR / MonoArrLR / GenArrLR / MapsToArrLR / IsoArrLR / EqArrLR /
                              ExistsArrLR / InclArrLR / "  "
   ArrowUD                 <- EpiArrUD / MonoArrUD / GenArrUD / MapsToArrUD / EqArrUD /
                              ExistsArrUD / "  "
   ArrowDiag               <- MonoArrDiag / GenArrDiag / "  "
   EpiArrLR                <- Arr("↞") / Arr("↠") 
   EpiArrUD                <- Arr("↟") / Arr("↡")
   MonoArrLR               <- Arr("↩") / Arr("↪")
   MonoArrUD               <- Arr("⮍") / Arr("⮏")
   MonoArrDiag             <- Arr("⤣") / Arr("⤤") / Arr("⤥") / Arr("⤦")
   GenArrLR                <- Arr("←") / Arr("→")
   GenArrUD                <- Arr("↑") / Arr("↓")
   GenArrDiag              <- Arr("↖") / Arr("↗") / Arr("↘") / Arr("↙")
   MapsToArrLR             <- Arr("↤") / Arr("↦")
   MapsToArrUD             <- Arr("↥") / Arr("↧")
   IsoArrLR                <- Arr("⭁") / Arr("⭇")
   EqArrLR                 <- Arr("⭀") / Arr("⥱")
   EqArrUD                 <- Arr("⇟") / Arr("⇞") 
   ExistsArrLR             <- Arr("⬸") / Arr("⤑")
   ExistsArrUD             <- Arr("⇡") / Arr("⇣")
   InclArrLR               <- Arr("⊃") / Arr("⊂")
   InclArrUD               <- Arr("⋃") / Arr("⋂")
   SelfMap                 <- Arr("⟳") Obj
   Arr(arr)                <- UnicodeSub(LowerLatin) arr
   Obj                     <- "." UnicodeSub(UpperAlpha) / "  "
   Scripted                <- (TextName / Alpha / Operator) ("\\limits" ((Sub? Sup?) / (Sup? Sub?)))?
   Sub                     <- SimpleSub / ComplexSub
   Sup                     <- SimpleSup / ComplexSup
   SimpleSub               <- :"_" (Alpha / [0-9])
   SimpleSup               <- :"^" (Alpha / [0-9])
   ComplexSub              <- :"_{" Expr :"}"
   ComplexSup              <- :"^{" Expr :"}"
   Expr                    <- EnclosedExpr / list(Expr, ",") / Scripted
   EnclosedExpr            <- "(" Expr ")" / "{" Expr "}" / "[" Expr "]"
   UnicodeSub(var)         <- var IntUniSubs?
   IntUniSubs              <- ("₋" / "₊")? + SubOneToNine ("₀" / SubOneToNine)*
   SubOneToNine            <- "₁" / "₂" / "₃" / "₄" / "₅" / "₆" / "₇" / "₈" / "₉"
   Alpha                   <- UpperAlpha / LowerAlpha
   UpperAlpha              <- UpperGreek / UpperLatin
   LowerAlpha              <- LowerGreek / LowerLatin
   UpperGreek              <- "Γ" / "Δ" / "Θ" / "Ξ" / "Π" / "Σ" / "Φ" / "Ψ" / "Ω"
   LowerGreek              <- "α" / "β" / "γ" / "δ" / "ε" / "ζ" / "η" / "θ" / "ι" /
                              "κ" / "ξ" / "π" / "ρ" / "σ" / "ς" / "τ" / "υ" / "φ" / 
                              "ψ" / "χ" / "λ" / "μ" / "ω"
   LowerLatin              <- [a-z]
   UpperLatin              <- [A-Z]
   TextName                < "\text{" (Alpha / Int / "-") "}"
   Int                     <~ ("-" / "+")? [1-9]+ [0-9]*
   Operator                <- "+"
`));
int main()
{
   enum parseTree1 = BananaCatsQueryLanguage(
`
i⟳.X

.X f↪ .Y
g↶    h↶
.Z x↪ .W

`);
   writeln(parseTree1);
   writeln("Hello D World!\n");
   readln();
   return 0;
}

The output is that it parses a SelfMap (the first \n\n-separated chunk in the input string) but does not parse the proceeding BlockDiagram. If you swap around the input string, put that BlockDiagram first, then it parses that first and leaves out the SelfMap.

veelo commented 5 years ago

From first sight, that is because the input matches the grammar already after the first BlockDiagram in Query. Change Query into

Query                   < BlockDiagram (:"\n\n" BlockDiagram)* eof

and it will only match if the complete input is consumed.

Disclaimer: untested.

veelo commented 5 years ago

If you got it to work, could you close this issue?