dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Build Status Build status

PEGGED

Pegged is a parsing expression grammar (PEG) generator implemented in the D programming language.

The idea is to give the generator a PEG, with the syntax presented in the reference article . From this grammar definition a set of related parsers will be created, to be used at runtime or compile time.

Usage

To use Pegged, just call the grammar function with a PEG and mix it in. For example:

import pegged.grammar;

mixin(grammar(`
Arithmetic:
    Term     < Factor (Add / Sub)*
    Add      < "+" Factor
    Sub      < "-" Factor
    Factor   < Primary (Mul / Div)*
    Mul      < "*" Primary
    Div      < "/" Primary
    Primary  < Parens / Neg / Pos / Number / Variable
    Parens   < "(" Term ")"
    Neg      < "-" Primary
    Pos      < "+" Primary
    Number   < ~([0-9]+)

    Variable <- identifier
`));

Open on run.dlang.io

This creates the Arithmetic grammar, with the Expr, Add, Factor (and so on) rules for basic arithmetic expressions with operator precedence ('*' and '/' bind stronger than '+' or '-'). identifier is a pre-defined parser recognizing your basic C-style identifier (first a letter or underscore, then digits, letters or underscores). In the rest of this document, I'll call 'rule' a Parser <- Parsing Expression expression and I'll use 'grammar' to designate the entire group of rules given to grammar.

To use a grammar, call it with a string. It will return a parse tree containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Arithmetic("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.matches);
assert(parseTree1.matches == ["1", "+", "2", "-", "(", "3", "*", "x", "-", "5", ")", "*", "6"]);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Arithmetic(" 0 + 123 - 456 ");
assert(parseTree2.matches == ["0", "+", "123", "-", "456"]);

Even for such a simple grammar and such a simple expression, the resulting parse tree is a bit long to be shown here. See the result here

By default, the grammars do not silently consume spaces, as this is the standard behaviour for PEGs. There is an opt-out though, with the simple < arrow instead of <- (you can see it in the previous example).

How to get Pegged

Pegged is a github project, hosted at https://github.com/PhilippeSigaud/Pegged

To get it:

$ git clone https://github.com/PhilippeSigaud/Pegged

The /docs directory contains an empty /wiki directory, linked to the github wiki as a git submodule. Here is how to get it:

$ cd <pegged directory>
$ git submodule init
$ git submodule update

This should give you a /docs/wiki directory full of markdown files, right from the online wiki.

Tutorial and docs

The Pegged wiki is here. It contains a growing tutorial. All the wiki pages are also present (as Markdown files) in the docs directory.

Features

More advanced features, outside the standard PEG perimeter are there to bring more power in the mix:

References

Articles:

D Code:

Other languages:

License

Pegged is released with the Boost license (like most D projects). See here for more details.

Contributing

Pegged itself is used in its own development. In particular, the file pegged/parser.d is generated from examples/peggedgrammar/src/pegged/examples/peggedgrammar.d. Therefore pegged/parser.d should not be edited by hand. However, if anything changes in any of the other files in pegged/, or in examples/peggedgrammar/src/pegged/examples/peggedgrammar.d, the parser must be regenerated. How to do that is described in pegged/dev.