arr-ai / wbnf

ωBNF implementation
Apache License 2.0
7 stars 4 forks source link

ωBNF — super awesome parser engine

GitHub Actions Go status

ωBNF is pronounced "omega BNF".

Grammar Syntax Guide

An ωBNF grammar file consists of an unordered list of rules (called productions) or comments.

Comments

A Comment can be either C++-style // This is a comment to the end of the line or C-style /* This is a comment which may span multiple lines */

Rules/Productions

A rule is defined in terms of terms, or terminals in the form of NAME -> TERM+ ;:

Terminals

Expressions

Terms can be grouped in various ways to build up rules.

Further Details

Delimited Repeater

This is the definition of the delimited repeater op=/{<:|:>?} opt_leading=","? named opt_trailing=","?.

Parser Configuration Commands (pragmas)

Some special commands are defined in the grammar to control the way the parser executes.

.import relative_filename Allows the wbnf file to merge the grammar of the imported filename into the current grammar (equivalent to #include in c)

.macro Name(args) { term } Allows the use of macros to minimise repetition in the grammar (see below)

Macros

Macros can be used when a common pattern is required through the grammar which cant easily be converted to a rule.

Macros are conceptually the same as C-style #define's, except rather than simply substituting text, a full expression can be used.

We will explain how to use macros by implementing the equivalent of the delimited repeater. First a macro is defined .macro Delim(term, sep) { term (sep term)* }, and used %!Delim(a, "<"? ":" ">"? )

This would expand to a (("<"? ":" ">"?) a)* which is equivalent of a:("<"? ":" ">"?)

Magic rules

Rules prefixed by a . are special rules governing the parser's overall behaviour. The following rules are recognised:

.wrapRE -> /{some () regex}

This rule instructs the parser to wrap every regular expression with this one. The actual regex is inserted into the ().

Example:

Useful recipes

Below are a collection of helpful rules which can be dropped into your grammar.

The ultimate example: ωBNF is self-hosting!

The ωBNF syntax described above is itself implemented in ωBNF. The following grammar is auto-generated from the formal grammar used in the ωBNF parsing engine.

// Non-terminals
grammar -> stmt+;
stmt    -> COMMENT | prod | pragma;
prod    -> IDENT "->" term+ ";";
term    -> (@ ("{" grammar "}")? ):op=">"
         > @:op="|"
         > @+
         > named quant*;
named   -> (IDENT op="=")? atom;
quant   -> op=[?*+]
         | "{" min=INT? "," max=INT? "}"
         | op=/{<:|:>?} opt_leading=","? named opt_trailing=","?;
atom    -> IDENT
         | STR
         | RE
         | macrocall
         | ExtRef=("%%" IDENT)
         | REF
         | "(?=" lookahead=term ")"
         | "(" term ")"
         | "(" ")";

macrocall   -> "%!" name=IDENT "(" term:","? ")";
REF         -> "%" IDENT ("=" default=STR)?;

// Terminals
COMMENT -> /{ //.*$
            | (?s: /\* (?: [^*] | \*+[^*/] ) \*/ )
            };
IDENT   -> /{@|\.?[A-Za-z_]\w*};
INT     -> \d+;
STR     -> /{ " (?: \\. | [^\\"] )* "
            | ' (?: \\. | [^\\'] )* '
            | ` (?: ``  | [^`]   )* `
            };
RE      -> /{
             /{
               (?:
                 \\.
                 | { (?: (?: \d+(?:,\d*)? | ,\d+ ) \} )?
                 | \[ (?: \\. | \[:^?[a-z]+:\] | [^\]] )+ ]
                 | [^\\{\}]
               )*
             \}
           | (?:
               (?:
                 \[ (?: \\. | \[:^?[a-z]+:\] | [^\]] )+ ]
               | \\[pP](?:[a-z]|\{[a-zA-Z_]+\})
               | \\[a-zA-Z]
               | [.^$]
               )(?: (?:[+*?]|\{\d+,?\d?\}) \?? )?
             )+
           };

// Special
pragma  -> import | macrodef {
                import   -> ".import" path=((".."|"."|[a-zA-Z0-9.:]+):,"/") ";"?;
                macrodef -> ".macro" name=IDENT "(" args=IDENT:","? ")" "{" term "}" ";"?;
            };

.wrapRE -> /{\s*()\s*};