Logicalshift / TameParse

LALR parser with context-sensitive extensions
MIT License
21 stars 2 forks source link

Grammar railroad diagram #3

Open mingodad opened 1 year ago

mingodad commented 1 year ago

Would be nice if tameparse could also generate an EBNF as understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams (https://en.wikipedia.org/wiki/Syntax_diagram).

I extended bison, byacc, lemon and btyacc to do so and can be seen here https://github.com/mingodad/lalr-parser-test , also CocoR here https://github.com/mingodad/CocoR-Java , unicc here https://github.com/mingodad/unicc , and peg/leg here https://github.com/mingodad/peg .

Would be nice to have it output a consolidated EBNF to have a full global view of the final grammar because usage of inheritance can use several pieces to compose the final grammar.

Bellow is a partial manual conversion of TameParse/Language/definition.tp to an EBNF understood by https://www.bottlecaps.de/rr/ui .

Copy and paste the EBNF shown bellow on https://www.bottlecaps.de/rr/ui on the tab Edit Grammar the click on the tab View Diagram to see/download a navigable railroad diagram.

//
// The top-level definitions
//

Parser-Language     ::= (TopLevel-Block)*

TopLevel-Block      ::= Language-Block
                        | Import-Block
                        | Parser-Block
                        | Test-Block

Language-Block      ::= language identifier/*[name]*/ (Language-Inherits)? '{' (Language-Definition)* '}'

Import-Block            ::= import string/*[filename]*/

Language-Inherits       ::= ':' identifier/*[inherit-from]*/

//
// The language block
//

Language-Definition ::= Lexer-Symbols-Definition
                        | Lexer-Definition
                        | Ignore-Definition
                        | Keywords-Definition
                        | Grammar-Definition
                        | Precedence-Definition

//
// Basic language items
//

Lexer-Symbols-Definition    ::= Lexer-Symbols-Modifier*/*[modifiers]*/ lexer-symbols '{' (Lexeme-Definition)*/*[definitions]*/ '}'

Lexer-Definition            ::= /*[=> Lexer-Modifier* lexer]*/ Lexer-Modifier*modifiers? lexer '{' (Lexeme-Definition)*/*[definitions]*/ '}'

Ignore-Definition           ::= ignore '{' (Keyword-Definition)*/*[definitions]*/ '}'

Keywords-Definition     ::= /*[=> Lexer-Modifier* keywords]*/ Lexer-Modifier*/*[modifiers]*/ keywords '{' (Keyword-Definition)*/*[definitions]*/ '}'

Lexer-Modifier          ::= weak
                            | case sensitive
                            | case insensitive

Lexer-Symbols-Modifier  ::= case sensitive
                            | case insensitive

Keyword-Definition      ::= identifier/*[literal]*/
                            | Lexeme-Definition/*[lexeme]*/

Lexeme-Definition       ::= identifier/*[name]*/ ('=' | "|=") (regex | string | character)
                            | /*[=> replace identifier '=']*/ replace identifier/*[name]*/ '=' (regex | string | character)
            | identifier/*[name]*/ '=' identifier/*[source-language]*/ '.' identifier/*[source-name]*/

//
// Defining grammars
//

Grammar-Definition      ::= grammar '{' (Nonterminal-Definition)*/*[nonterminals]*/ '}'

Nonterminal-Definition  ::= /*[=> nonterminal ('=' | "|=")]*/ nonterminal ('=' | "|=") Production ('|' Production)*
                            | /*[=> replace nonterminal '=']*/ replace nonterminal '=' Production ('|' Production)*

// Top level is just a simple EBNF term, as the '|' operator creates a new production at this point
Production              ::= (Simple-Ebnf-Item)*/*[items]*/

Ebnf-Item                   ::= (Simple-Ebnf-Item)*/*[items]*/
                            | (Simple-Ebnf-Item)*/*[items]*/ '|' Ebnf-Item/*[or-item]*/

Simple-Ebnf-Item            ::= Nonterminal Semantic-Specification?
                            | Terminal Semantic-Specification?
                            | Guard Semantic-Specification?
                            | Simple-Ebnf-Item '*' Semantic-Specification?
                            | Simple-Ebnf-Item '+' Semantic-Specification?
                            | Simple-Ebnf-Item '?' Semantic-Specification?
                            | '(' Ebnf-Item ')' Semantic-Specification?

Guard                       ::= "[=>" Ebnf-Item ']'
                            | "[=>" '[' can-clash ']' Ebnf-Item ']'

Nonterminal             ::= nonterminal
                            | identifier/*[source-language]*/ '.' nonterminal

Terminal                    ::= Basic-Terminal
                            | identifier/*[source-language]*/ '.' Basic-Terminal

Basic-Terminal          ::= identifier/*[lexeme-name]*/
                            | string
                            | character

//
// Semantics
//

Semantic-Specification  ::= '[' Semantic-Item/*[first-item]*/ (',' Semantic-Item)*/*[more-items]*/ ']'

Semantic-Item               ::= identifier/*[name]*/
                            | conflict '=' shift
                            | conflict '=' reduce
                            | conflict '=' weak reduce

//
// Defining precedence
//

Precedence-Definition       ::= precedence '{' Precedence-Item*/*[items]*/ '}'

Precedence-Item         ::= left Equal-Precedence-Items
                            | right Equal-Precedence-Items
                            | non-associative  Equal-Precedence-Items
                            | non-assoc Equal-Precedence-Items

Equal-Precedence-Items  ::= Simple-Ebnf-Item
                            | '{' Simple-Ebnf-Item*/*[terminals]*/ '}'

//
// The parser declaration block
//

Parser-Block                ::= parser identifier/*[name]*/ ':' identifier/*[language-name]*/ '{' (Parser-StartSymbol)+/*[start-symbols]*/ '}'

Parser-StartSymbol      ::= Nonterminal

//
// Test definition block
//

Test-Block              ::= test identifier/*[language-name]*/ '{' Test-Definition*/*[tests]*/ '}'

Test-Definition         ::= Nonterminal '=' Test-Specification+
                            | Nonterminal "!=" Test-Specification+
                            | Nonterminal from Test-Specification+

Test-Specification      ::= string
                            | /*[=> identifier '(']*/ identifier '(' string ')'

/// Weak keywords
/// Declared here to suppress warnings
//weak keywords {
    //\(\S+\) -> \1 ::= "\1"
language ::= "language"
import ::= "import"
lexer-symbols ::= "lexer-symbols"
lexer ::= "lexer"
ignore ::= "ignore"
weak ::= "weak"
keywords ::= "keywords"
grammar ::= "grammar"
replace ::= "replace"
parser ::= "parser"
test ::= "test"
from ::= "from"
case ::= "case"
sensitive ::= "sensitive"
insensitive ::= "insensitive"
precedence ::= "precedence"

left ::= "left"
right ::= "right"
non-associative ::= "non-associative"
non-assoc ::= "non-assoc"

conflict ::= "conflict"
shift ::= "shift"
reduce ::= "reduce"

can-clash ::= "can-clash"
//}
mingodad commented 1 year ago

I'm working to achieve a LALR(1)/LEX to try grammars online with wasm based on https://github.com/BenHanson/gram_grep and I've got the Tameparse grammar, view it here https://mingodad.github.io/parsertl-playground/playground/ select Tameparse parser (not working) from the examples, you can edit the Grammar or the Input source and press Parse to see a parser tree.

I hope it can be a nice tool to experiment with LALR(1)/LEX grammars with instant feedback !