Grammar railroad diagram

mingodad commented 1 year ago

Would be nice if this parser generator could generate an EBNF compatible with https://www.bottlecaps.de/rr/ui to generate railroad diagrams from the grammars.

There is also an online parser generator https://www.bottlecaps.de/rex/ and an online converter https://www.bottlecaps.de/convert/ for several grammar formats.

I did some work with some parser generators here https://github.com/mingodad/lalr-parser-test , https://github.com/mingodad/CocoR-Typescript .

For example a quick and dirty manual transformation of the LUA grammar is shown bellow:

Copy and paste the EBNF shown bellow on https://www.bottlecaps.de/rr/ui on the tab Edit Grammar the click on the tab View Diagram to see/download a navigable railroad diagram.


//INCLUDE "LuaLexer.ccc"

Root ::=
      SHEBANG?
   Block EOF

Block /*#(>=0)*/ ::= (Statement)* LastStatement?

Statement ::=
   SEMICOLON //#EmptyStatement(1)
   |
   Assignment
   |
   FunctionCall
   |
   Label
   |
   BREAK //#BreakStatement(1)
   |
   GotoStatement
   |
   DoBlock
   |
   WhileStatement
   |
   RepeatStatement
   |
   IfStatement
   |
   ForStatement
   |
   FunctionDeclaration
   |
   LocalFunctionDeclaration
   |
   LocalAttributeAssignment

Assignment ::= VarList ASSIGN /*=>||*/ ExpressionList

DoBlock ::= DO Block END

GotoStatement ::= GOTO NAME

ForStatement ::=
   FOR NAME ASSIGN /*=>||*/ Expression COMMA Expression (COMMA Expression)? DoBlock
   |
   FOR NameList IN /*=>||*/ ExpressionList DoBlock
   |
   FOR FAIL "invalid for statement"

FunctionDeclaration ::= FUNCTION FunctionName FunctionBody

LocalFunctionDeclaration ::= LOCAL FUNCTION /*=>||*/ NAME FunctionBody

LocalAttributeAssignment ::= LOCAL /*=>|+1*/ AttributeNameList (ASSIGN ExpressionList)?

IfStatement ::= IF Expression THEN Block (ELSEIF Expression THEN Block)* (ELSE Block)? END

RepeatStatement ::= REPEAT Block UNTIL Expression

WhileStatement ::= WHILE Expression DO Block END

AttributeNameList ::= NAME Attribute? (COMMA NAME Attribute?)*

Attribute ::= LT NAME GT

LastStatement ::= ( RETURN ExpressionList? | BREAK ) SEMICOLON?

Label ::= DOUBLE_COLON NAME DOUBLE_COLON

FunctionName ::= NAME (DOT NAME)* (COLON NAME)?

VarList ::= Var (COMMA Var)*

NameList ::= NAME (COMMA NAME /*=>||*/)*

ExpressionList ::= Expression (COMMA Expression)*

Literal ::= NIL | FALSE | TRUE | Number | StringLiteral | ELLIPSIS

PrimaryExpression ::= Literal | FunctionDef | PrefixExp | TableConstructor

PowerExpression ::= PrimaryExpression (HAT UnaryExpression)*

UnaryExpression ::= (UnaryOperator)* PowerExpression

MultiplicativeExpression ::= UnaryExpression (MultiplicativeOperator UnaryExpression)*

AdditiveExpression ::= MultiplicativeExpression ((PLUS|MINUS) MultiplicativeExpression)*

StringCatExpression ::= AdditiveExpression (STR_CAT StringCatExpression)*

ComparisonExpression ::= StringCatExpression (ComparisonOperator ComparisonExpression)*

AndExpression ::= ComparisonExpression (AND ComparisonExpression)*

OrExpression ::= AndExpression (OR AndExpression)*

Expression ::= OrExpression (BitwiseOperator OrExpression)*

PrefixExp ::= VarOrExp (NameAndArgs)*

FunctionCall ::= VarOrExp (NameAndArgs)+

VarOrExp ::= (NAME | LPAREN Expression RPAREN ) (VarSuffix)*

Var ::= (NAME | LPAREN Expression RPAREN VarSuffix) (VarSuffix)*

VarSuffix ::= (NameAndArgs)* /*=>|+1*/ (LBRACKET Expression RBRACKET | DOT NAME)

NameAndArgs ::= (COLON NAME)? Args

Args ::= LPAREN ExpressionList? RPAREN | TableConstructor | StringLiteral

FunctionDef ::= FUNCTION FunctionBody

FunctionBody ::= LPAREN ParamList? RPAREN Block END

ParamList ::= NameList (COMMA ELLIPSIS)? | ELLIPSIS

TableConstructor ::= LBRACE FieldList?  RBRACE

FieldList ::= Field ((COMMA | SEMICOLON) /*=>|+1*/ Field)* (COMMA | SEMICOLON)?

Field ::= LBRACKET Expression RBRACKET ASSIGN Expression | (NAME ASSIGN /*=>||*/)? Expression

ComparisonOperator ::= LT | GT | LE | GE | NE | EQ

MultiplicativeOperator ::= TIMES | SLASH | MOD | DOUBLE_SLASH

BitwiseOperator ::= BIT_AND | BIT_OR | TILDE | LSHIFT | RSHIFT

UnaryOperator ::= NOT | HASH | MINUS | TILDE

Number ::= INT | HEX | FLOAT | HEX_FLOAT

StringLiteral ::= NORMALSTRING | CHARSTRING | LONGSTRING

// Tokens
//| <\([^: ]+\)\s*: \("[^"]+"\)\s*>

//TOKEN #Delimiter :
SEMICOLON ::= ";"
COMMA ::= ","
LPAREN ::= "("
RPAREN ::= ")"
LBRACE ::= "{"
RBRACE ::= "}"
LBRACKET ::= "["
RBRACKET ::= "]"

//TOKEN #Operator :
ASSIGN ::= "="
LT ::= "<"
LE ::= "<="
GT ::= ">"
GE ::= ">="
COLON ::= ":"
DOUBLE_COLON ::= "::"
DOT ::= "."
ELLIPSIS ::= "..."
HAT ::= "^"
NE ::= "~="
EQ ::= "=="
STR_CAT ::= ".."
PLUS ::= "+"
MINUS ::= "-"
TIMES ::= "*"
SLASH ::= "/"
DOUBLE_SLASH ::= "//"
MOD ::= "%"
BIT_OR ::= "|"
BIT_AND ::= "&"
TILDE ::= "~"
LSHIFT ::= "<<"
RSHIFT ::= ">>"
HASH ::= "#"
OR ::= "or"
AND ::= "and"
NOT ::= "not"

//TOKEN #Literal :
NORMALSTRING ::= '"' ( ESCAPE_SEQUENCE | ([^\"]) )* '"'
CHARSTRING ::= "'" (ESCAPE_SEQUENCE | ([^\']))* "'"
DIGIT ::= [0-9]
INT ::= (DIGIT)+
HEX ::= "0" [xX] (HEX_DIGIT)+
FLOAT ::=
      (INT "." (DIGIT)* (EXPONENT_PART)?)
      |
      ("." INT (EXPONENT_PART)?)
      |
      (INT EXPONENT_PART)
HEX_FLOAT ::=
      ("0" [xX] (HEX_DIGIT)+ "." (HEX_DIGIT)* (HEX_EXPONENT_PART)?)
      |
      ( "0" [xX] "." (HEX_DIGIT)+ (HEX_EXPONENT_PART)?)
      |
      ( "0" [xX] (HEX_DIGIT)+ HEX_EXPONENT_PART)
EXPONENT_PART ::= [eE] ([-+])? INT
HEX_EXPONENT_PART ::= [pP] ([-+])? INT
ESCAPE_SEQUENCE ::=
     ("\" [abfnrtvz'"|$#\])
     |
     ("\" ("\r")? "\n")
     |
     DECIMAL_ESCAPE
     |
     HEX_ESCAPE
     |
     UTF_ESCAPE
DECIMAL_ESCAPE ::=
     "\"
     (
        ([02] DIGIT DIGIT)
        |
        (DIGIT (DIGIT)?)
     )
HEX_ESCAPE ::= "\x" HEX_DIGIT HEX_DIGIT
UTF_ESCAPE ::= "\u{" (HEX_DIGIT)+ "}"
HEX_DIGIT ::= [0-9a-fA-F]

//TOKEN #KeyWord :
BREAK ::= "break"
DO ::= "do"
ELSE ::= "else"
ELSEIF ::= "elseif"
END ::= "end"
FALSE ::= "false"
FOR ::= "for"
FUNCTION ::= "function"
GOTO ::= "goto"
IF ::= "if"
IN ::= "in"
LOCAL ::= "local"
NIL ::= "nil"
REPEAT ::= "repeat"
RETURN ::= "return"
THEN ::= "then"
TRUE ::= "true"
UNTIL ::= "until"
WHILE ::= "while"

SHEBANG ::= "#"
NAME ::= [a-zA-Z_]([a-zA-Z0-9_])*

SINGLE_LINE_COMMENT_START ::= "--"
MULTILINE_START ::= "--[" ("=")* "["
LONGSTRING_START ::= "[" ("=")* "["

And here is an unfinished congocc EBNF manual transformation:

/*
ENSURE_FINAL_EOL;
JAVA_UNICODE_ESCAPE;
PARSER_PACKAGE=org.congocc.parser;
NODE_PACKAGE=org.congocc.parser.tree;
DEFAULT_LEXICAL_STATE=JAVA;
BASE_SRC_DIR="../java";
TOKEN_CHAINING;

DEACTIVATE_TOKENS=_INCLUDE,_INJECT,_EOF;

#if fault_tolerant
  Howdy!!!
 FAULT_TOLERANT=true;
#endif
*/

/* congocc RESERVED WORDS: These are the only tokens in congocc but not in Java */

//TOKEN ::=
_INJECT ::= "INJECT"
_INCLUDE ::= "INCLUDE" | "INCLUDE_GRAMMAR"
_FAIL ::= "FAIL"
_UNCACHE_TOKENS ::= "UNCACHE_TOKENS"
_ACTIVE_TOKENS ::= "ACTIVE_TOKENS"
_ACTIVATE_TOKENS ::= "ACTIVATE_TOKENS"
_DEACTIVATE_TOKENS ::= "DEACTIVATE_TOKENS"
_ENSURE ::= "ASSERT"
_SCAN ::= "SCAN"
_IGNORE_CASE: "IGNORE_CASE"
_TOKEN: "TOKEN" | "REGULAR_TOKEN"
_UNPARSED: "SPECIAL_TOKEN" | "UNPARSED"
_MORE: "MORE" | "INCOMPLETE_TOKEN"
_SKIP: "SKIP"
_EOF ::= "EOF"
_ATTEMPT: "ATTEMPT"
_RECOVER ::= "RECOVER"
_RECOVER_TO ::= "RECOVER_TO"
_ON_ERROR ::= "ON_ERROR"
HASH ::= "#"
BACKSLASH ::= "\\"
RIGHT_ARROW ::= "=>"
UP_TO_HERE ::= "=>|" ("|" | ("+" ["0"-"9"]))
_LEXICAL_STATE ::= "LEXICAL_STATE"
/*
<SINGLE_QUOTE_STRING:
"'"
(
~["'","\\","\n","\r"]
|
<STRING_ESCAPE>
){2,}
"'"
> #StringLiteral
*/
START_UNPARSED ::= "{$"
/*
<IN_UNPARSED_CODE_BLOCK> TOKEN ::=
  <UNPARSED_CONTENT ::= (~["$"] | (("$")+ ~["}"]))+ > #UnparsedContent
*/
END_UNPARSED ::= "$}"

UnparsedCodeBlock /*#org.congocc.core.UnparsedCodeBlock*/ ::=
   START_UNPARSED
   LEXICAL_STATE IN_UNPARSED_CODE_BLOCK
   (
     UNPARSED_CONTENT?
     END_UNPARSED
   )

// In general usage, it is probably better to INCLUDE the (more stable) Java grammar
// that is in the bootstrap jarfile.
//INCLUDE JAVA
//INCLUDE "../../examples/java/Java.ccc"

VariableDeclarator /*#*/ ::= VariableDeclaratorId ( "=" VariableInitializer )?

/*#Root throws IOException #*/ GrammarFile ::=
   (
      Options
   )?//!
   (
      TokenProduction
      |
      CodeInjection2
      |
      CodeInjection
      |
      GrammarInclusion /*=>||*/
      |
      BNFProduction
  )+//!
  EOF

GrammarInclusion /*throws IOException*/ ::=
   /*ACTIVATE_TOKENS _INCLUDE*/ (_INCLUDE)
   (
       (
           STRING_LITERAL
           |
           IDENTIFIER
       )
       (
         "!"
         (STRING_LITERAL | IDENTIFIER)
       )*
       |
       "(" STRING_LITERAL ")"
   )
   ";"?

CodeInjection ::=
        /*ACTIVATE_TOKENS _INJECT*/ (_INJECT) ("(" )?
        (
            "class"
            |
            "interface"
        )?
        IDENTIFIER
        (
            /*SCAN {usingParentheses} =>*/ ")"
        )?
        ":"
        /*=>|+1*/
        (
          SCAN "{" ("}" | "import" | "extends" | "implements" | (Annotation)* "}") => "{"
        )?
        (
            ImportDeclaration
        )*
        (
            Annotation
        )*
        (
             "extends"
             ObjectType
             (SCAN 1 {isInterface} => "," ot=ObjectType)*
             ";"?
        )?
        (
             SCAN 1 {!isInterface} =>
             "implements" ot=ObjectType {CURRENT_NODE.addImplementsType(ot);}
             ("," ot=ObjectType {CURRENT_NODE.addImplementsType(ot);})*
             ";"?
        )?
        (
           SCAN 0 {foundOptionalInitialBrace} => "}"
        )?
        (ClassOrInterfaceBody)?

CodeInjection2 ::=
        /*ACTIVATE_TOKENS _INJECT*/ (_INJECT)
        ":" /*=>||*/ "{"
        CompilationUnit
       "}"

Options /*#*/ ::=
    Setting /*(settings) =>||*/
    (Setting /*(settings)*/)*

Setting /*(Map<String,Object> settings) #Setting*/ ::=
  (
     IDENTIFIER
     |
     _IGNORE_CASE
     |
     _DEACTIVATE_TOKENS
  )
  /*=>|+1*/
  (
     "="
     (
        "true"
        |
        "false"
        |
        INTEGER_LITERAL
        |
        STRING_LITERAL
        |
        SCAN <IDENTIFIER> "."
        =>Name {value = peekNode().toString();}
        |
        IDENTIFIER
        ( HASH IDENTIFIER )?
        (","
            IDENTIFIER
            ( HASH IDENTIFIER )?
        )*
     )
  )?
  ";"

BNFProduction /*#org.congocc.core.BNFProduction*/ ::=
    (
        "public" | "private" | "protected"
    )?
    (
       SCAN ReturnType <IDENTIFIER> => ReturnType
       |
       "#"
    )?
    IDENTIFIER
    FormalParameters?
      ThrowsList?
    TreeNodeDescriptor?
    (
        "RECOVER_TO" ExpansionChoice
    )?
    ":"
    (
        /*SCAN 2 =>*/ <IDENTIFIER>
        ":"
    )?
    (
        Block
        ("#" )?
        ASSERT ~(";") =>||
    )?
    ExpansionChoice
    ";"

TreeNodeDescriptor /*#TreeBuildingAnnotation*/ ::=
  "#" (Name|"abstract"|"interface"|"void"|{})
  (
       "("
          (
            (">" | ">=" | "<" | "<=" | "+" | "-")
          )?
          Expression
       ")"
  )?

InlineTreeNodeDescriptor /*#TreeBuildingAnnotation*/ ::=
  BACKSLASH? HASH Name
  (
    SCAN ~("(") => {}
    |
    "(" (">" | ">=" | "<" | "<=" | "+" | "-") =>||
    (Expression )?
    ")"
    |
    "("
    ASSERT ~(ExpansionSequence "|") //A bit kludgy, but we treat this case specially for principle of least surprise.
    Expression
    ")"
    // If any of the following tokens are after the closing parenthesesis, this must
    // be an expansion (or the code is just invalid)
    ASSERT ~(<STAR>|<PLUS>|<HOOK>|<HASH>)
    =>||
    |
    SCAN "(" ExpansionChoice ")" => {}
    |
    "(" FAIL "Expecting either an expression or an expansion here."
 )

TokenProduction ::=
  (
    "<" "*" /*=>||*/ ">"
    |
    "<" IDENTIFIER
      ("," IDENTIFIER)*
     ">"
  )?
  (_TOKEN | _UNPARSED | _SKIP | _MORE)
  (
    "[" "IGNORE_CASE"  "]"
  )?
  ("#" IDENTIFIER
  )?
  ":"
   RegexpSpec /*(tokenClassName)*/
   ( "|" RegexpSpec(tokenClassName) )*
   (
      ";"
   )

RegexpSpec /*(String tokenClassName) #RegexpSpec*/ ::=
    (
        RegexpStringLiteral
        |
        LT
        (
            ("#" )?
            IDENTIFIER
            ":"
        )?
        RegexpChoice
        GT
    )
    (
        "#"IDENTIFIER

    )?
    (
        //SCAN 1 {!regexp.isPrivate()} =>
        Block
    )?
    (
        //SCAN 1 {!regexp.isPrivate()} =>
        ":" IDENTIFIER
    )?

ExpansionChoice ::=
  ExpansionSequence ( "|" ExpansionSequence)*

ExpansionWithParentheses ::=
   (LexicalStateSwitch | TokenActivation)?
   "(" /*=>||*/ ExpansionChoice  ")"
   (
       "*"
       |
       "?"
       |
       "+"
   )?
   (
       //SCAN ~\...\Lookahead =>
       "!"
   )?
   (UpToHere /*(CURRENT_NODE)*/)?

ExpansionSequence /*#org.congocc.core.ExpansionSequence*/ ::=
  (
    //SCAN ~\...\Lookahead
    //=>
    Lookahead  /*=>||*/
  )?
  (
     ExpansionUnit /*=>||*/
  )+!

Assertion ::=
   "ASSERT"
   (
    "{"
       Expression
    "}"
    ("#" )?
    |
    ("~" )?
    "("
    ExpansionChoice
    ")"
   )
   (
       ":" Expression
       ":"?
   )?
   (UpToHere /*(CURRENT_NODE)*/)?

 #Lookahead# ::=
{
   Token amountToken=null;
   boolean hasSemanticLookahead = false, getHasExplicitNumericalLookahead=false;
   Expansion expansion = null;
   Expression exp=null;
   Name name = null;
   Node lb = null;
}
[name=Name "=" =>|| {CURRENT_NODE.setLHS(name);}]
<_SCAN>
[
    <INTEGER_LITERAL> {getHasExplicitNumericalLookahead = true;}
]
[
    "{"
    exp=Expression {hasSemanticLookahead = true; CURRENT_NODE.setSemanticLookahead(exp);}
    "}"
    ["#" {CURRENT_NODE.setSemanticLookaheadNested(true);}]
]
[LookBehind =>|| {lb = peekNode();}]
[
    SCAN {!getHasExplicitNumericalLookahead} =>
    ["~" {CURRENT_NODE.setNegated(true);}]
    ExpansionChoice {expansion = (Expansion) peekNode();}
    <RIGHT_ARROW> =>||
    {
       CURRENT_NODE.setNestedExpansion(expansion);
    }
]
(
    SCAN {expansion == null} => <RIGHT_ARROW>
    |
    SCAN {expansion != null || (exp ==null && lb == null)} => {}
)

LookBehind ::=
   (TILDE )?
   (LookBehindForward | LookBehindBackward)

LookBehindForward /*#void*/ ::=
   (
       SLASH
       (
         (TILDE? IDENTIFIER)
         |
         (DOT | VAR_ARGS)
       )
   )+
   BACKSLASH?

LookBehindBackward /*#void*/ ::=
   (
       BACKSLASH
       (
          (TILDE? IDENTIFIER)
          |
          (DOT | VAR_ARGS)
       )
   )+
   SLASH?

ChildNameInfo /*(Expansion expansion)*/ ::= //#void ::=
  // TODO these delimiters are provisional - agreement needed on final form
  "/"
  (
    IDENTIFIER
    |
    ( "["
      IDENTIFIER
    "]" )
  )
  "/"

Expansion ExpansionUnit ::=
 (
  UncacheTokens
  |
  Failure
  |
  Block
  (
    "#"
  )?
  |
  UnparsedCodeBlock // Currently unused
  |
  //SCAN 1 ~\...\Lookahead =>
  AttemptBlock
  |
  //SCAN 1 ~\...\Lookahead =>
  TryBlock
  |
  Assertion
  |
  ExpansionWithParentheses
  |
  ZeroOrOne
  |
  Terminal
  |
  NonTerminal
  |
  FAIL
 )
 (
     //SCAN 1 ~\...\Lookahead =>
    InlineTreeNodeDescriptor
 )

/*#*/ NonTerminal /*#org.congocc.core.NonTerminal*/ ::=
  (
    Name  "="
    /*=>||*/
  )?
  IDENTIFIER
  /*=>||*/
  (
    SCAN "(" ExpansionSequence "|" //=> {}
    |
    SCAN "(" ExpansionChoice ")" ("*"|"+"|"?") //=> {}
    |
    InvocationArguments =>||
  )?
  (
     //SCAN ~\...\Lookahead =>
     "!"
  )?
  (ChildNameInfo /*(CURRENT_NODE)*/)?
  (UpToHere /*(CURRENT_NODE)*/)?

RegularExpression /*Terminal #void*/ ::=

  SCAN (Name "=")? (STRING_LITERAL | "<")
  =>
  (
    SCAN ~\...\Lookahead Name "=" =>
    lhs = Name
    "="
  )?
  ACTIVATE_TOKENS _EOF(RegexpStringLiteral | RegexpRef | EndOfFile )
  (
      SCAN ~\...\Lookahead
      =>
      {
          RegularExpression regexp = (RegularExpression) peekNode();
          regexp.setTolerantParsing(true);
      }
      "!" {regexp.addChild(popNode());}
  )?
  (
    ChildNameInfo(result)
    {result.addChild(popNode());}
  )?
  (
    UpToHere(result)
    {result.addChild(popNode());}
  )?

UpToHere /*(Expansion exp) #void*/ ::=
   UP_TO_HERE

//The following two productions are not actually used. These constructs are now
// handled by ExpansionWithParentheses so the following two productions
// are not actually used. They have to be there so that the ZeroOrMore and
// OneOrMore types get defined. REVISIT. Need a way of defining Node subtypes
// without creating a dummy grammar rule for them.
ZeroOrMore ::= "(" ExpansionChoice ")" "*"
OneOrMore ::= "(" ExpansionChoice ")" "+"

//This production just matches the square bracket syntax.
// The (...)? syntax is handled by ExpansionWithParentheses
ZeroOrOne ::=
    (LexicalStateSwitch | TokenActivation)?
    "[" /*=>||*/ ExpansionChoice  "]"
     ("!" )?
     (UpToHere /*(CURRENT_NODE)*/)?

AttemptBlock ::=
 "ATTEMPT" ExpansionChoice "RECOVER"  (ExpansionWithParentheses | Block)

UncacheTokens /*#*/ ::= "UNCACHE_TOKENS"

Failure /*#*/ ::=
   "FAIL"
   (
      ":"?
      Expression
      |
      Block
   )?

LexicalStateSwitch ::= "LEXICAL_STATE" IDENTIFIER

TokenActivation ::=
   ("ACTIVE_TOKENS" | "ACTIVATE_TOKENS" | "DEACTIVATE_TOKENS" )
   ("+"|"-")? IDENTIFIER
   (","? ("+"|"-")? IDENTIFIER)*

TryBlock ::=
    "try" "{" ExpansionChoice "}"
    (
        CatchBlock
    )*
        FinallyBlock?

RegexpStringLiteral /*#*/ ::=
   (STRING_LITERAL | CHARACTER_LITERAL | SINGLE_QUOTE_STRING)

/*#*/RegexpRef ::=
    "<"
    IDENTIFIER /*=>||*/
    /*DEACTIVATE_TOKENS RSIGNEDSHIFT, RUNSIGNEDSHIFT*/ (">")

EndOfFile ::= "<" _EOF /*=>||*/ ">"

RegexpChoice ::=
    RegexpSequence
    ("|" RegexpSequence)*

RegexpChoiceInParen /*#RegexpChoice*/ ::=
   "(" RegexpSequence ("|" RegexpSequence)* ")"

RegexpSequence ::=
   (
      RegexpStringLiteral
      |
      RegexpRef
      |
      CharacterList
      |
      RepeatedRegexp
   )+

RepeatedRegexp  ::=
  RegexpChoiceInParen
  (  "+" #OneOrMoreRegexp(2)
   | "*" #ZeroOrMoreRegexp(2)
   | "?" #ZeroOrOneRegexp(2)
   | (
      "{" INTEGER_LITERAL
           ( "," INTEGER_LITERAL? )?
       "}"
     ) //#RepetitionRange(+1)
  )?

CharacterList ::=
  "~"?
  "[" (CharacterRange
        ( "," CharacterRange)*
      )?
  "]"

CharacterRange ::=
    (STRING_LITERAL | CHARACTER_LITERAL)
    (
    "-"
    (STRING_LITERAL | CHARACTER_LITERAL)
   )?

revusky commented 1 year ago

Well, to be honest, I am not very clear on what benefit there is in having the grammar in this format. (I don't mean to say that in any aggressive way, mind you. I just don't know really.) I would say that, if it is important to you to be able convert existing Congo grammars to this railroad diagram format, it's probably not very hard to do. The way I would do it is just to use a visitor pattern.

There is a perhaps useful example of this, but it is not in this project (or well... it actually is the same project! :-)) here: https://github.com/javacc21/javacc21/blob/master/src/java/com/javacc/output/congo/SyntaxConverter.java

Well, there may be a bit of confusion here. You see, CongoCC is a rebranding of JavaCC 21. JavaCC 21 was work on that old JavaCC thing from Sun, eventually a complete rewrite, but one difference between Congo and JavaCC 21 is that JavaCC 21 still supported the legacy JavaCC syntax, but Congo doesn't support it any more. So I wrote a syntax converter, with some glitches, but basically works to convert existing grammars to the newer streamlined syntax. But you can see why this syntax converter is part of JavaCC 21, but not Congo. The Congo parser doesn't know anything about the legacy syntax! (Oh, and by the way, you could also infer (correctly) from all this that CongoCC is really a much more mature tool/codebase than the age (or number of commits) on Github might suggest.)

But anyway, in principle, you could use the same basic structure to convert to this railroad diagram syntax. I would guess that it's a mini-project of rather moderate scale.

I guess, if you wanted to write such a converter, I would have no problem rolling it into the tool in case some people want that. I guess it could just be invoked something like:

   java -jar congocc-full.jar railroad-diagram <grammar-filename>

And it could output the grammar in that syntax. So, that's something to consider.

In terms of things a bit along these lines, what I think could add some real value now would be to have some tooling, like maybe in Eclipse or Intellij, just typical stuff like syntax highlighting and point/click navigation, like control-click on a non-terminal and jump to where it is defined, that kind of thing. The thing is that that kind of GUI programming is not my specialty and I'm just too occupied with the core tool anyway. Well, I just throw that out there, just in case...

I'll close this message here.

mingodad commented 1 year ago

Thanks for reply ! For me the railroad diagram is a tool to show the whole grammar for documentation and debugging purposes in an automated way.

It's not easy to have that global vision with the extra attributes and embedded code.

Like to show to potential users the whole grammar accepted by this tool .

Cheers !

revusky commented 1 year ago

Actually, I wrote my last response before really exploring it that much. I have to admit that I hadn't followed your full instructions of pasting the converted grammar into the web interface and seeing the graphical representation. So, I actually understand the thing better than when I responded. At least I understand the motivation. Before that, I didn't really understand the value of converting the grammar to the other format. I guess I answered too quickly, though my comments about how to implement this if you really want it, that does stand.

Though, one problem with it is that one is throwing away information. Possibly, anyway. If there is a predicate expressed at some juncture in Java code, it isn't reflected in the generated diagram. Nor are contextual predicates. By the way, do you know of any other parser generator that has that feature? (Granted, if you use contextual predicates, you no longer have a context-free grammar!) But again, generally speaking, the generated diagram does not embody the full information to reproduce the working parser. (But if that is not its purpose....)

The graphical representation is nice. I actually do see some use for it in my own documentation efforts. So, if I sounded a bit dismissive in the previous message, I am somewhat more interested now.

I don't know if you ever used the old legacy JavaCC. That had (and has) this tool called JJDoc. It generated an HTML with hyperlinks to show the structure of a grammar. After picking up the JavaCC code to work on, I eventually threw away that JJDoc thing. I didn't feel it was very useful or appealing. Not in that state, without some significant additional work. The idea is okay, of course, but it generated such horrendously ugly pages that were not configurable, and... Actually, you can see what the output of JJDoc looks like here. On that page, just scroll down to where it has the heading Non-Terminals and you see what I'm talking about. I finally threw that JJDoc thing away because it felt like a choice between actually making the thing presentable (which is not my forte!) or just dropping it and maybe possibly re-implementing the thing later in a better way. So I choice the latter.

But these railroad diagrams are much sexier than that.

stbischof commented 1 year ago

@mingodad a converter that generates Documentation using https://mermaid.js.org/syntax/flowchart.html would be interesting for me.

mingodad commented 1 year ago

@revusky thanks for reply again the railroad diagram generator also perform some simplifications/optimizations to the grammar that sometimes help tidy up the original grammar. And you are right the purpose is not to translate the grammar to be able to produce a fully working parser in EBNF understood by https://www.bottlecaps.de/rr/ui (although if it happens to be feasible it doesn't hurt).

mingodad commented 1 year ago

@stbischof I'm not sure I understood your request/comment could you provide a simple grammar manually converted to the mermaid syntax to show your point ?

stbischof commented 1 year ago

Its just the idea to Switch the way to handle it. Not using the given rr diagram library that expects to get ebnf But using an other chart engin like Mermaid that could generate rr Diagramm with flowCharts.

And then generate Chats using converter

adMartem commented 1 year ago

I just noticed this thread. If I may jump in, as it turns out I've had a notion on my back burner regarding railroad diagrams for some time. In particular, for the COBOL compiler world (and SQL), railroad diagrams are de rigueur. So consequently I have thought that it would be nice to be able to automatically generate documentation in this form (being very lazy) directly from the grammar. Originally I looked at driving a converter off of the JavaCC/JTB source of my grammar, but it was so filled with grammar quirks, artifacts, and semantic lookahead that I decided to defer further attention until either inspiration struck or a miracle happened. It's now clear I was waiting for CongoCC to happen for this and several other things I had mentally filed away.

One of the things I was familiar with back then was the H2 database project. I, in fact, use a portion of it to implement an indexed (key/value) file organization required by COBOL. One of the things that H2 has is a really cool railroad diagram with cross links, coloring, documentation, and code examples. It also can switch to and from a BNF view of the various elements. I wondered how they produced it, and found that they had, as part of H2, a complete capability of converting a csv file of syntax elements into a BNF grammar that could be visited and produce the railroad diagram as styled HTML. That was wonderful (as it is licensed under MPL 1.0), except I didn't have my grammar in csv form, nor could I easily produce it from my then javacc grammar.

However, since you are exploring this, I thought I would provide this in case it is useful (I had also looked at the bottlecaps project, but you already know about that). This is the csv file they use for their SQL documentation. If you look around the project you will find the BNF converter, visitors, and other components (mostly in the doc packages). I would encourage you to check out the output of the process here . If I were doing this for my grammar, I would probably try to figure out how to include the out-of-band information (such as documentary and examples) via something like special comments in the .ccc file (sort of like javadocs) which would include a way to omit and/or rename productions whose nonterminal name is inappropriate (document aliases, effectively). Via coloring and the ability to document in-line, I think the problem with semantic predicates, assertions and such could be mitigated. On the other hand, having so much stuff embedded in the grammar might make it inscrutable for purposes of actual development; I don't know.

Anyway, that is about all I know about this topic. I just thought I would dump it here in case it was useful. If somehow there does come to be a tool for this in some form, I'll definitely try to use it.

adMartem commented 1 year ago

Further thought about this leads me to believe that Congo's ability to define abstract nodes give us the ability to include both structured comments (similar to Javadocs) and actual visitable properties that are probably crucial to having the ability to include information necessary to produce syntax diagrams mechanically that would be useful to an end user. I can see how we could actually get to the point that the grammar could become the documentation for the language (syntactically speaking, of course).

vsajip commented 8 months ago

I've set up an experimental service at https://ccc2ebnf.red-dove.com/ for converting CongoCC grammars to EBNF as used by https://bottlecaps.de/rr - the underlying functionality runs successfully through all the main CongoCC example grammars, but the web interface doesn't allow processing of INCLUDE directives (you should be able to add the lexer directives inline). It's in a very early pre-alpha state, so expect a few hiccups, but you are welcome to try it and give feedback at https://github.com/vsajip/ccc2ebnf/issues/new - I look forward to hearing from you!

ss23

mingodad commented 8 months ago

Actually I'm also working with an Yacc/Lex compatible online editor/tester here https://mingodad.github.io/parsertl-playground/playground/

vsajip commented 8 months ago

I thought the point was to convert CongoCC source grammars? I don't see CongoCC in your drop-down list - is that because this functionality is not yet available to you? For the purpose mentioned in the original post of this issue (get railroad diagrams from CongoCC grammars) the site I linked to seems to fit the bill.

congo-cc / congo-parser-generator

Grammar railroad diagram #9