Open mingodad opened 3 years ago
The above EBNF is not correct somenone pointed out that my naive interpretation of menhir grammar like bison grammar is not correct (mainly rules starting with |
doesn't mean /*empty*/
).
I'm working to fix it.
I have edited the EBNF from the first message to fix several issues with the tool I used to create it.
Thanks to introduce your tool, it recalled to me a tool I used several years ago (maybe ebnf2ps between 1995 and 2000). Instead of postscript figures your tool gives a more usable output, that is a good point.
But, it is difficult to use in our development such a tool until there is no automatic tool chain from the original grammar file (used by parser generators such as yacc, menhir, ...). At least, that requires a translator of these source formats into the format required by your tool to cope with the modification of the source file.
Tags inserted as comments into the source file should also be usable by the translator to control the final output. It could be a way to specify the rules to inline in order to simplify the final figures and retrieve something closer to the grammar figures of the ACSL document. For the same reason, theses tags should allow the renaming of some nodes (because the source file may not use the same names than the ACSL document).
Finally, the same kind of feature could be helpful for the lexical definitions of the grammar tokens.
What do you suggest to solve these engineering difficulties?
Ask or modify menhir
to ouptut EBNF .
Use a custom parser like I did (see bellow) using one of https://github.com/mingodad/CocoR-CPP , https://github.com/mingodad/CocoR-CSharp , https://github.com/mingodad/CocoR-Java and download https://www.bottlecaps.de/rr/download/rr-1.63-java8.zip to use offline.
Here is the custom parser I created for menhir
grammars:
/*
Need to check and fix tail like ' | /* empty */'
*/
#include "Scanner.nut"
COMPILER Menhir
CHARACTERS
letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_".
digit = "0123456789".
cr = '\r'.
lf = '\n'.
tab = '\t'.
ff = '\f'.
stringCh = ANY - '"' - '\\' - cr - lf.
charCh = ANY - '\'' - '\\' - cr - lf.
printable = '\u0020' .. '\u007e'.
hex = "0123456789abcdef".
TOKENS
ID = (letter | '.') { letter | digit | '.' | '-'}.
INT_LITERAL = digit { digit }.
STRING = '"' { stringCh | '\\' printable } '"'.
badString = '"' { stringCh | '\\' printable } (cr | lf).
//CHAR_LITERAL = '\'' ( charCh | '\\' printable { hex } ) '\''.
LEFT_BRACE = '{'.
RIGHT_BRACE = '}'.
LEFT_PAREN = '('.
RIGHT_PAREN= ')'.
LEFT_ANGLEB = '<'.
RIGHT_ANGLEB = '>'.
ARROW = "->".
PRAGMAS
COMMENTS FROM "(*" TO "*)" NESTED
COMMENTS FROM "/*" TO "*/" NESTED
COMMENTS FROM "//" TO lf
IGNORE cr + lf + tab + ff
/*-------------------------------------------------------------------------*/
PRODUCTIONS
Menhir =
prologue_declarations "%%" grammar [epilogue] EOF
.
epilogue =
"%%" {ANY}
.
prologue_declarations =
prologue_declaration {prologue_declaration}
.
prologue_declaration =
"%{" {ANY} "%}"
| "%token" [tag] token_decls
| ("%left" | "%right" | "%nonassoc") token_decls
| "%start" [tag] token_decls
| "%type" [tag] token_decls
| "%parameter" tag
.
token_decls =
token_id {token_id}
.
token_id =
ID [params]
| STRING
.
grammar =
rule {rule}
.
rule =
["%inline" | "%public"] ID (. printf("%s ::= ", t.val); .) [params]
':' ['|'] rule_id_list {'|' (. printf("\n\t|"); .) rule_id_list} [';']
(. printf("\n\n"); .)
.
rule_id_list =
rule_id_elm {rule_id_elm} [sema]
| sema ["%prec" ID] (. printf(" /* empty */"); .)
| "%prec" ID sema (. printf(" /* empty */"); .)
.
rule_id_elm = (. string the_id; .)
rule_id<out the_id> ['=' rule_id<out the_id>]
["%prec" ID] [';']
(. printf(" %s", the_id); .)
.
rule_id<out string the_id> =
"option" '(' (
("terminated" | "preceded") '('
ID (. the_id = "(" + t.val; .)
','
ID (. the_id += " " + t.val + ")?"; .)
')'
| ID (. the_id = t.val + "?"; .)
) ')'
| "separated_nonempty_list" '('
ID (. the_id = "(" + t.val; .)
','
ID (. the_id += " " + t.val + ")+"; .)
')'
| "separated_list" '('
ID (. the_id = "(" + t.val; .)
','
ID (. the_id += " " + t.val + ")*"; .)
')'
|(
ID (. the_id = t.val; .) //when ID == "option" we need see inside "()"
| STRING (. the_id = t.val; .)
) [params] [('*' | '?' | '+') (. the_id += t.val; .)]
.
params =
'(' (. SkipNested(ParserTokens._LEFT_PAREN, ParserTokens._RIGHT_PAREN); .) ')'
.
sema =
'{' (. SkipNested(ParserTokens._LEFT_BRACE, ParserTokens._RIGHT_BRACE); .) '}'
.
tag =
'<' (. SkipNested(ParserTokens._LEFT_ANGLEB, ParserTokens._RIGHT_ANGLEB); .) '>'
.
END Menhir.
I've done a experimental tool to convert bison grammars to a kind of EBNF understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams see bellow the converted
src/kernel_internals/parsing/cparser.mly
and with some hand made changes to allow view it at https://www.bottlecaps.de/rr/ui the order of the rules could be changed to a better view of the railroad diagrams. Copy and paste the EBNF bellow on https://www.bottlecaps.de/rr/ui tab Edit Grammar then switch to the tab View Diagram.