CordyJ / OpenTxl

TXL programming language compiler/interpreter
Other
18 stars 1 forks source link

Grammar railroad diagram #29

Closed mingodad closed 4 months ago

mingodad commented 5 months ago

Would be nice if txl could also output an EBNF understood by https://github.com/GuntherRademacher/rr to generate nice navigable railroad diagrams to help document/develop/debug the grammars (see a manuall extraction example bellow).

I already did something like this for bison/byacc/lemon here https://github.com/mingodad/lalr-parser-test and also here https://mingodad.github.io/parsertl-playground/playground/ (would be nice to have a playground like the last one too).

And while looking at the code in src/unparse.i::real_printGrammar https://github.com/CordyJ/OpenTxl/blob/c289dec190558bd085de3e6a7e94196d731f9d93/src/unparse.i#L505 I noticed that there is a lot of repetitions of tree.trees (grammarTP).kind the Turing+ compiler can opitmize all of that array access or they are seeking the tree.trees every time it appear ?

        if tree.trees (grammarTP).kind = kindT.choose or
                tree.trees (grammarTP).kind = kindT.order or 
                tree.trees (grammarTP).kind = kindT.repeat or 
                tree.trees (grammarTP).kind = kindT.list or 
                tree.trees (grammarTP).kind = kindT.leftchoose or
                tree.trees (grammarTP).kind = kindT.generaterepeat or
                tree.trees (grammarTP).kind = kindT.generatelist or
                tree.trees (grammarTP).kind = kindT.lookahead then

Example of a manually extracted EBNF from the txl grammar:

//
// EBNF to be viewd at
//    (IPV6) https://www.bottlecaps.de/rr/ui
//    (IPV4) https://rr.red-dove.com/ui
//
// Copy and paste this at one of the urls shown above in the 'Edit Grammar' tab
// then click the 'View Diagram' tab.
//

program::=
      repeat_statement

repeat_statement::=
      statement
    | repeat_statement statement

statement::=
      includeStatement
    | keysStatement
    | compoundsStatement
    | commentsStatement
    | tokensStatement
    | defineStatement
    | redefineStatement
    | ruleStatement
    | functionStatement
    | externalStatement

includeStatement::=
      "include" stringlit

keysStatement::=
      "keys" repeat_literal "end" "keys"

compoundsStatement::=
      "compounds" repeat_literal "end" "compounds"

commentsStatement::=
      "comments" repeat_commentConvention "end" "comments"

repeat_commentConvention::=
      commentConvention
    | repeat_commentConvention commentConvention

commentConvention::=
      literal
    | literal literal

tokensStatement::=
      "tokens" repeat_tokenPattern "end" "tokens"

repeat_tokenPattern::=
      tokenPattern
    | repeat_tokenPattern tokenPattern

tokenPattern::=
      typeid stringlit
    | '|' stringlit
    | typeid "..." '|' stringlit
    | typeid '+' stringlit

defineStatement::=
      "define" typeid repeat_literalOrType repeat_barLiteralsAndTypes "end" "define"

redefineStatement::=
      "redefine" typeid opt_dotDotDotBar repeat_literalOrType repeat_barLiteralsAndTypes opt_barDotDotDot "end" "redefine"

opt_dotDotDotBar::=
      /*%empty*/
    | dotDotDotBar

dotDotDotBar::=
      "..." '|'

opt_barDotDotDot::=
      /*%empty*/
    | barDotDotDot

barDotDotDot::=
      '|' "..."

repeat_barLiteralsAndTypes::=
      barLiteralsAndTypes
    | repeat_barLiteralsAndTypes barLiteralsAndTypes

barLiteralsAndTypes::=
      '|' repeat_literalOrType

repeat_literalOrType::=
      literalOrType
    | repeat_literalOrType literalOrType

literalOrType::=
      literal
    | type

opt_type::=
      /*%empty*/
    | type

type::=
      '[' typeid ']'
    | '[' "opt" typeidOrQuotedLiteral ']'
    | '[' "repeat" typeidOrQuotedLiteral opt_plusOrStar ']'
    | '[' "list" typeidOrQuotedLiteral opt_plusOrStar ']'
    | '[' "attr" typeidOrQuotedLiteral ']'
    | '[' "see" typeidOrQuotedLiteral ']'
    | '[' "not" typeidOrQuotedLiteral ']'
    | '[' "push" typeidOrQuotedLiteral ']'
    | '[' "pop" typeidOrQuotedLiteral ']'
    | '[' ']'
    | '[' typeidOrQuotedLiteral '?' ']'
    | '[' typeidOrQuotedLiteral '*' ']'
    | '[' typeidOrQuotedLiteral '+' ']'
    | '[' typeidOrQuotedLiteral ',' ']'
    | '[' typeidOrQuotedLiteral ',' '+' ']'
    | '[' ':' typeidOrQuotedLiteral ']'
    | '[' '~' typeidOrQuotedLiteral ']'
    | '[' '>' typeidOrQuotedLiteral ']'
    | '[' '<' typeidOrQuotedLiteral ']'

opt_plusOrStar::=
      /*%empty*/
    | plusOrStar

plusOrStar::=
      '+'
    | '*'

typeidOrQuotedLiteral::=
      typeid
    | quotedLiteral

ruleStatement::=
      "rule" ruleid repeat_formalArgument repeat_constructDeconstructImportExportOrCondition opt_skippingType "replace" opt_dollar type pattern repeat_constructDeconstructImportExportOrCondition "by" replacement "end" "rule"
    | "rule" ruleid repeat_formalArgument repeat_constructDeconstructImportExportOrCondition opt_skippingType "match" type pattern repeat_constructDeconstructImportExportOrCondition "end" "rule"

opt_dollar::=
      /*%empty*/
    | '$'

opt_star::=
      /*%empty*/
    | '*'

opt_not::=
      /*%empty*/
    | "not"

opt_all::=
      /*%empty*/
    | "all"

opt_each::=
      /*%empty*/
    | "each"

functionStatement::=
      "function" ruleid repeat_formalArgument repeat_constructDeconstructImportExportOrCondition opt_skippingType "replace" opt_star type pattern repeat_constructDeconstructImportExportOrCondition "by" replacement "end" "function"
    | "function" ruleid repeat_formalArgument repeat_constructDeconstructImportExportOrCondition opt_skippingType "match" opt_star type pattern repeat_constructDeconstructImportExportOrCondition "end" "function"

externalStatement::=
      "external" "rule" ruleid repeat_formalArgument
    | "external" "function" ruleid repeat_formalArgument

repeat_formalArgument::=
      formalArgument
    | repeat_formalArgument formalArgument

formalArgument::=
      varid type

repeat_constructDeconstructImportExportOrCondition::=
      constructDeconstructImportExportOrCondition
    | repeat_constructDeconstructImportExportOrCondition constructDeconstructImportExportOrCondition

constructDeconstructImportExportOrCondition::=
      constructor
    | deconstructor
    | condition
    | import
    | export

constructor::=
      "construct" varid type replacement

deconstructor::=
      opt_skippingType "deconstruct" opt_not opt_star opt_type varid deconstructor_1

deconstructor_1::=
      /*%empty*/
    | pattern

condition::=
      "where" opt_not opt_all expression
    | "assert" opt_not opt_all expression

import::=
      "import" varid opt_type opt_pattern

export::=
      "export" varid opt_type opt_replacement

opt_skippingType::=
      /*%empty*/
    | skippingType

skippingType::=
      "skipping" type

opt_pattern::=
      /*%empty*/
    | pattern

pattern::=
      repeat_literalOrVariable

repeat_literalOrVariable::=
      literalOrVariable
    | repeat_literalOrVariable literalOrVariable

literalOrVariable::=
      literal
    | varid type
    | varid

opt_replacement::=
      /*%empty*/
    | replacement

replacement::=
      repeat_literalOrExpression

repeat_literalOrExpression::=
      literalOrExpression
    | repeat_literalOrExpression literalOrExpression

literalOrExpression::=
      literal
    | expression

expression::=
      varid repeat_ruleApplication

repeat_ruleApplication::=
      ruleApplication
    | repeat_ruleApplication ruleApplication

ruleApplication::=
      '[' ruleid repeat_varidOrLiteral opt_each repeat_varidOrLiteral ']'

repeat_varidOrLiteral::=
      varidOrLiteral
    | repeat_varidOrLiteral varidOrLiteral

varidOrLiteral::=
      varid
    | literal

repeat_literal::=
      literal
    | repeat_literal literal

literal::=
      quotedLiteral
    | unquotedLiteral

quotedLiteral::=
      "'" unquotedLiteral

unquotedLiteral::=
      id
    | stringlit
    | charlit
    | number
    | key
    | repeat_special

repeat_special::=
      special
    | repeat_special special

special::=
      '!'
    | '@'
    | '#'
    | '$'
    | '^'
    | '&'
    | '*'
    | '('
    | ')'
    | '_'
    | '+'
    | '{'
    | '}'
    | ':'
    | '<'
    | '>'
    | '?'
    | '~'
    | '\\'
    | '='
    | '-'
    | ';'
    | ','
    | '.'
    | '/'
    | '['
    | ']'
    | '|'

varid::=
      id

typeid::=
      id

ruleid::=
      id
mingodad commented 5 months ago

I'm looking at the generated C source and can see that it doesn't optimize the array access but probably the C compiler will, also interesting that the C generated code still uses the K&R style for functions.

static void unparser_real_printGrammar (grammarTP, indentation)
treePT  grammarTP;
TLint4  indentation;
{
...
   unparser_printTypedTree((treePT) grammarTP, (TLint4) 0, (TLboolean) 1, (TLboolean) 0);
    if (((((((((tree_trees[grammarTP].kind) == 1) || ((tree_trees[grammarTP].kind) == 0)) || ((tree_trees[grammarTP].kind) == 2)) || ((tree_trees[grammarTP].kind) == 3)) || ((tree_trees[grammarTP].kind) == 4)) || ((tree_trees[grammarTP].kind) == 5)) || ((tree_trees[grammarTP].kind) == 6)) || ((tree_trees[grammarTP].kind) == 7)) {
    TLnat4  kind;
...
mingodad commented 5 months ago

And here using a script with search and replace on C18 grammar to get an EBNF to generate a navigable railroad diagram:

program ::=
    translation_unit

translation_unit ::=
    function_definition_or_declaration*

function_definition_or_declaration ::=
        function_definition
    |   struct_or_union_definition
    |   enum_definition
    |   declaration
//#ifdef GNU
    |   asm_statement
//#endif
//#ifdef LINUX
    |   macro_declaration_or_statement
//#endif
//#ifdef PREPROCESSOR
    |   preprocessor
//#endif

function_definition ::=
     function_header
//#ifdef PREPROCESSOR
    preprocessor_function_header*
//#endif
    KP_parameter_decls?
    compound_statement

function_header ::=
    "{!}"simple_function_call_statement         //% guard to avoid misparsing simple macro and function calls
    declaration_specifiers? function_declarator
    gnu_attributes?

function_declarator ::=
    declarator function_declarator_extension
    |   "{^}"'DEFUN' macro_call function_declarator_extension   //% observed Emacs project
    |   '(' function_declarator ')'

preprocessor_function_header ::=
    preprocessor
    function_header

preprocessor ::=
     preprocessor_line

KP_parameter_decls ::=

        KP_simple_declaration*
//#ifdef LINUX
        KP_variadic_declaration_spec?
//#endif

KP_simple_declaration ::=
    parameter_declaration semi

KP_variadic_declaration_spec ::=
    macro_name

declaration ::=
        "{!}"simple_function_call_statement             //% guard to avoid misparsing simple macro and function calls
    declaration_specifiers declarator_opt_init_semi
    |   extern_langauge_declaration
    |   null_declaration
//#ifdef LINUX
    |   machinestart_declaration
//#endif
//#ifdef PREPROCESSOR
    |   preprocessor
//#endif

simple_function_call_statement ::=
    //% used as a guard to avoid misparsing macro and function call statements as declarations
    "{!}"'main' identifier arguments_extension "{!}"declarator_opt_init_semi "{!}"pointer "{!}"declarator_extension "{!}"'{'

declarator_opt_init_semi ::=
    declarator_opt_init_list semi
    |   declarator? '='? compound_initializer   //% allow no semi on compound initializer

declarator_opt_init_list ::=
    (declarator_opt_init ',')+

extern_langauge_declaration ::=
    'extern' stringlit '{'
    declaration*
    '}' ';'?

null_declaration ::=
    semi

semi ::=
    ';'

declaration_specifiers ::=
    declaration_specifier+

declaration_specifier_or_declarator ::=
    declaration_specifier | declarator

declaration_specifier ::=
    storage_class_specifier
    |   type_specifier
    |   type_qualifier
    |   function_specifier
    |   alignment_specifier
//#ifdef LINUX
    |   macro_type_specifier
//#endif
//#ifdef GNU
    |   gnu_attribute_spec
//#endif

struct_or_union_definition ::=
    declaration_specifiers?
    struct_or_union
//#ifdef GNU
    gnu_attributes?
//#endif
    identifier? struct_or_union_body
//#ifdef GNU
    gnu_attributes?
//#endif

struct_or_union_specifier ::=
    struct_or_union
//#ifdef GNU
    gnu_attributes?
//#endif
//#ifdef LINUX
    macro_type_specifier?
//#endif

struct_or_union_body ::=
    '{'
    struct_declaration_list?
    '}'
//#ifdef GNU
    gnu_attributes?
//#endif
    declarator_opt_init_semi?

struct_declaration_list ::=
    struct_declaration+

struct_declaration ::=
    struct_declaration_
    |   struct_or_union_definition
    |   enum_definition
    |   null_declaration
//#ifdef LINUX
    |   macro_declaration_or_statement
//#endif
//#ifdef PREPROCESSOR
    |   preprocessor
//#endif

struct_declaration_ ::=
        declaration_specifiers struct_declarator_list semi  //% prefer with declarator
    |   declaration_specifiers semi                 //% but C allows none

struct_declarator_list ::=
    (struct_declarator ',')+

struct_declarator ::=
    declarator struct_bit_field_size?
//#ifdef GNU
    sub_struct_declarators?
    gnu_attributes_or_asm*
//#endif
|
    struct_bit_field_size

struct_bit_field_size ::=
    ':' constant_expression

sub_struct_declarators ::=
    '(' (struct_declarator ',')* ')'

storage_class_specifier ::=
    typedef_specifier
    |   extern_specifier
    |   static_specifier
    |   thread_local_specifier
    |   auto_specifier
    |   register_specifier
//#ifdef GNU
    |   near_far_specifier
    |   local_specifier
    |   vector_specifier
//#endif

typedef_specifier ::=
    'typedef'

extern_specifier ::=
    'extern' stringlit?

static_specifier ::=
        'static'
//#ifdef GNU
    |   'STATIC'
//#endif

thread_local_specifier ::=
    '_Thread_local'

auto_specifier ::=
    'auto'

register_specifier ::=
    'register'

near_far_specifier ::=
    'near' | 'far' | 'NEAR' | 'FAR'

local_specifier ::=
    'local'

vector_specifier ::=
    'vector'

type_specifier ::=
    void_specifier
    |   char_specifier
    |   short_specifier
    |   int_specifier
    |   long_specifier
    |   float_specifier
    |   double_specifier
    |   signed_specifier
    |   unsigned_specifier
    |   bool_specifier
    |   complex_specifier
    |   atomic_type_specifier
    |   struct_or_union_specifier
    |   enum_specifier
    |   typedef_name
//#ifdef GNU
    |   typeof_specifier
//#endif

macro_type_specifier ::=
    macro_name '(' declaration_specifier+ pointer* ')'

void_specifier ::=
    'void'

char_specifier ::=
    'char'

short_specifier ::=
    'short'

int_specifier ::=
    'int'

long_specifier ::=
    'long'

float_specifier ::=
    'float'

double_specifier ::=
    'double'

signed_specifier ::=
    'signed'
//#ifdef GNU
    |   'signed__' | '__signed__' | '__signed'
//#endif

unsigned_specifier ::=
    'unsigned'
//#ifdef GNU
    |   'unsigned__' | '__unsigned__' | '__unsigned'
//#endif

bool_specifier ::=
    '_Bool'
//#ifdef GNU
    'bool' | '__bool' | '__bool__' | 'bool__'
//#endif

complex_specifier ::=
    complex_ simple_type_or_qualifier*

complex_ ::=
    '_Complex'
//#ifdef GNU
    |   'complex' | '__complex' | '__complex__' | 'complex__'
//#endif

atomic_type_specifier ::=
    '_Atomic' '(' type_name ')'

typedef_name ::=
    identifier

typeof_specifier ::=
    typeof_ '(' expression_or_type_name ')'

typeof_ ::=
    'typeof' | '__typeof' | '__typeof__' | 'typeof__'

expression_or_type_name ::=
    type_name
    |   expression

simple_type_or_qualifier ::=
    simple_type_name
    |   type_qualifier

type_qualifier ::=
    const_specifier
    |   volatile_specifier
    |   restrict_specifier
    |   atomic_specifier
//#ifdef GNU
    |   weak_specifier
    |   initdata_specifier
    |   gnu_type_qualifier
//#endif
//#ifdef LINUX
    |   linux_type_qualifier
//#endif

gnu_type_qualifier ::=
        'internal_function'  //% bison
    |   'yyconst'            //% postgresql
    |   'pascal'             //% macintosh
    |   gnuextensionid

linux_type_qualifier ::=
        'asmlinkage'
    |   '_license'
    |   '_version'
    |   macro_name "{!}"'('

declarator_init_or_close_paren ::=
    declarator
    |   '='
    |   ')'

const_specifier ::=
    'const'
//#ifdef GNU
    |   '__const' | '__const__' | 'const__' | 'CONST'
//#endif
//#ifdef LINUX
    |   'const_debug'
//#endif

volatile_specifier ::=
    'volatile'
//#ifdef GNU
    |   '__volatile' | '__volatile__' | 'volatile__' | 'VOLATILE'
//#endif

restrict_specifier ::=
    'restrict'
//#ifdef GNU
    |   '__restrict' | 'restrict__' | '__restrict__' | 'RESTRICT'
//#endif

atomic_specifier ::=
    '_Atomic'

weak_specifier ::=
    '__weak' | '__weak__' | 'weak__'

initdata_specifier ::=
    '__initdata' | '__devinitdata' | '__cpuinitdata' | '__read_mostly' | '__initmv'
    |   '__initdata_or_module' | '__pminitdata' | '__cpuinit' | '__devinit' | '__meminit'

function_specifier ::=
    inline_specifier
    |   noreturn_specifier

inline_specifier ::=
    'inline'
//#ifdef GNU
    |   '__inline' | '__inline__' | 'inline__'
//#endif

noreturn_specifier ::=
    'Noreturn'
//#ifdef GNU
    |   'noreturn' | '__noreturn' | '__noreturn__' | 'noreturn__'
//#endif

alignment_specifier ::=
    alignas_ '(' type_name ')'
    |   alignas_ '(' constant_expression ')'

alignas_ ::=
    'Alignas'
//#ifdef GNU
    |   'alignas'
//#endif

simple_type_name ::=
    char_specifier
    |   int_specifier
    |   void_specifier
    |   float_specifier
    |   double_specifier
    |   type_id

type_id ::=
    identifier "{!}"declarator_extension

struct_or_union ::=
    'struct' | 'union'

enum_definition ::=
    declaration_specifiers?
    'enum'
//#ifdef GNU
    gnu_attributes?
//#endif
    identifier? enumerator_body
//#ifdef GNU
    gnu_attributes?
//#endif

enumerator_body ::=
    '{'
        enumerator_list? ','?
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    '}'
//#ifdef GNU
    gnu_attributes?
//#endif
    declarator_opt_init_semi?

enumerator_list ::=
    (enumerator ',')+

enumerator ::=
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    enumerator_element

enumerator_element ::=
         enumerator_name enumerator_value
    |   enumerator_name

enumerator_name ::=
    identifier
//#ifdef LINUX
    identifier*
//#endif
    |   dot_identifier
//#ifdef LINUX
    |   macro_call macro_call*
//#endif

dot_identifier ::=
     '.' identifier

enumerator_value ::=
    '=' constant_expression

enum_specifier ::=
    'enum'

declarator_opt_init ::=
    declarator
//#ifdef GNU
    gnu_attributes_or_asm*
//#endif
    initialization?

declarator ::=
    pointer* direct_declarator declarator_extension*
    macro_call?     //% observed

direct_declarator ::=
    identifier
    |   '(' declaration_specifiers? declarator ')'

declarator_extension ::=
    function_declarator_extension
    |   array_declarator_extension

function_declarator_extension ::=
    '(' parameter_type_list ')' maybe_type_qualifier_list

array_declarator_extension ::=
    '[' 'static'? type_qualifier_list? 'static'? assignment_expression? '*'? ']' maybe_type_qualifier_list

maybe_type_qualifier_list ::=
    empty
    |   type_qualifier_list

type_qualifier_list ::=
    (type_qualifier ',')+

pointer ::=
    pointer_specifier? '*' pointer_qualifier_list?
    |   pointer_specifier? '(' pointer+ ')' pointer_qualifier_list?

pointer_specifier ::=
    near_far_specifier
//#ifdef GNU
    |   gnuextensionid
//#endif

pointer_qualifier_list ::=
    pointer_qualifier+

pointer_qualifier ::=
     type_qualifier

type_name ::=
    declaration_specifiers abstract_declarator*

abstract_declarator ::=
    pointer+
    |   pointer*   direct_abstract_declarator+

direct_abstract_declarator ::=
    declarator_extension

parameter_type_list ::=
    (parameter_declaration ',')* comma_dotdotdot?

parameter_declaration ::=
    declaration_specifiers parameter_declarator_list    //% prefer with declarator
    |   declaration_specifiers              //% but optional in C
//#ifdef GNU
    |   '(' parameter_type_list ')'
//#endif

comma_dotdotdot ::=
    ','  '...'  //% Really only allowed last in a non-empty list

parameter_declarator_list ::=
    (parameter_declarator ',')+

parameter_declarator ::=
    declarator          gnu_attributes_or_asm*
    |   abstract_declarator gnu_attributes_or_asm*

initialization ::=
//#ifdef GNU
    initdata_specifier?
//#endif
    '=' initializer
//#ifdef GNU
    |   compound_initializer
    |   '=' "{^}"';'
//#endif

initializer ::=
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    initializer_unit
//#ifdef GNU
    initializer_unit*
//#endif

initializer_unit ::=
    assignment_expression
    |   compound_initializer
//#ifdef GNU
    |   element_label colon_equals_or_equals? initializer ';'?
    |   '[' constant_expression dotdotdot? constant_expression? ']' '='? initializer ';'?
//#endif

colon_equals_or_equals ::=
    ':' | '=' | '|='

compound_initializer ::=
    '{' '}' //% redundant, but avoids newlines in output
    |
        cast_specifier?
    '{'
        (sub_initializer ',')* ','?
//#ifdef PREPROCESSOR
        preprocessor_list_initializer*
//#endif
//#ifdef LINUX
        upper_macro_name?       //% EXTRA_PARMS
//#endif
     '}'
//#ifdef LINUX
    |   macro_call ';'
//#endif

cast_specifier ::=
    '&' cast_operator

sub_initializer ::=
    assignment_expression
    |
//#ifdef LINUX
    macro_call?
//#endif
    initializer_unit

dotdotdot ::=
     '...'

element_label ::=
    '.'? element_name element_name_extension*

element_name_extension ::=
    '.' element_name
    |   '[' constant_expression ']'

element_name ::=
    identifier

preprocessor_list_initializer ::=
    preprocessor (initializer ',')* ','?

statement ::=
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    label* unlabeled_statement
    |   label+                  //% e.g. at end of switch block
//#ifdef PREPROCESSOR
    |   preprocessor
//#endif

label ::=
     label_name ':'
    |    'case' constant_expression ':'
    |    'default' ':'
//#ifdef GNU
    |    'case' constant_expression  '...'  constant_expression ':'
//#endif

label_name ::=
    identifier

unlabeled_statement ::=
    simple_statement semi
    |   structured_statement
//#ifdef GNU
    |   gnu_statement
//#endif

gnu_statement ::=
    error_statement

error_statement ::=
    'error' ':'? id+ '+'? id* semi?

structured_statement ::=
    if_statement
    |   for_statement
    |   while_statement
    |   switch_statement
    |   do_statement
    |   compound_statement
    |   asm_statement

simple_statement ::=
    jump_statement
    |   null_statement
    |   expression_statement

null_statement ::=
    empty

compound_statement ::=
    '{'
    compound_statement_body
    '}' ';'?
//#ifdef PREPROCESSOR
    preprocessor*
//#endif

compound_statement_body ::=
    block_item_list?

block_item_list ::=
    block_item+

block_item ::=
    declaration_or_statement

declaration_or_statement ::=
    declaration
    |   statement
    |   struct_or_union_definition
    |   enum_definition
//#ifdef GNU
    |   function_definition
//#endif
//#ifdef PREPROCESSOR
    |   preprocessor
//#endif
//#ifdef LINUX
    |   macro_declaration_or_statement
//#endif
//#ifdef ROBUST
    |   unknown_declaration_or_statement
//#endif

expression_statement ::=
    expression_list

if_statement ::=
    'if' '(' condition
//#ifdef PREPROCESSOR
        preprocessor*
//#endif
        ')' sub_statement
    ELIF_statement*     //% observed - JRC
    else_statement?
//#ifdef LINUX
    |
    'if' macro_call
        sub_statement
    ELIF_statement*     //% observed - JRC
    else_statement?
//#endif

ELIF_statement ::=
        'ELIF' '(' condition ')'
        sub_statement
    |
        'ELIF' macro_call
        sub_statement

sub_statement ::=
    compound_statement          //% avoid { on separate line
    |     "{!}"'{' statement
//#ifdef LINUX
    |     macro_declaration_or_statement
//#endif

switch_statement ::=
    'switch' '(' expression_list ')' sub_statement
//#ifdef LINUX
    |   'switch' macro_call sub_statement
//#endif

else_statement ::=
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    'else' else_sub_statement

else_sub_statement ::=
    //% to format else-if correctly
    if_statement
    |   sub_statement

while_statement ::=
    'while' '(' condition ')' sub_statement
//#ifdef GNU
    else_statement?
//#endif
//#ifdef LINUX
    |   whileeachid '(' expression_list ')' sub_statement
    else_statement?
    |   'LOOP' sub_statement        //% observed - JRC
    |   'forever' sub_statement //% observed - JRC
//#endif

do_statement ::=
    'do' sub_statement do_while_condition semi

for_statement ::=
    'for' '(' non_null_declaration? expression_list? ';' expression_list? semi_opt_expression_list? ')' sub_statement
//#ifdef LINUX
    |   foreachid '(' expression_list ')' sub_statement
//#endif

non_null_declaration ::=
    declaration_specifiers? declarator_opt_init_list? ';'

semi_opt_expression_list ::=
    ';' expression_list?

jump_statement ::=
    goto_statement
    |   continue_statement
    |   break_statement
    |   return_statement

goto_statement ::=
        'goto' label_name
//#ifdef GNU
    |   'goto' pointer expression
//#endif

continue_statement ::=
    'continue'

break_statement ::=
    'break'

return_statement ::=
    'return' expression_list?
//#ifdef GNU
        gnu_attributes?
//#endif

asm_statement ::=
    identifier_equals? asm_spec semi?

identifier_equals ::=
    identifier '='

asm_spec ::=
    asm_ type_qualifier* 'goto'? '('  asm_item* ')'  gnu_attributes?

asm_item ::=
    stringlit ','?
    |   '('  asm_item* ')'
    |   "{!}"'(' "{!}"')' token_or_key

asm_ ::=
    'asm' | '__asm' | '__asm__' | 'asm__'  | 'asm_safe'

expression_list ::=
    (expression ',')+
//#ifdef LINUX
    comma_empty_brackets?
//#endif

comma_empty_brackets ::=
    ',' empty_brackets?

empty_brackets ::=
    '{' '}' ','?

condition ::=
    expression_list

expression ::=
    assignment_expression

constant_expression ::=
    conditional_expression

assignment_expression ::=
    conditional_expression assign_assignment_expression?

assign_assignment_expression ::=
    assignment_operator assignment_expression

assignment_operator ::=
    '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|='

conditional_expression ::=
    logical_OR_expression conditional_operation?

conditional_operation ::=
    '?' expression? ':' conditional_expression

logical_OR_expression ::=
    logical_AND_expression OR_logical_AND_expression*

OR_logical_AND_expression ::=
    logical_OR_operator logical_AND_expression

logical_OR_operator ::=
    '||'
//#ifdef GNU
    |  'or' | 'OR'
//#endif

logical_AND_expression ::=
    inclusive_OR_expression AND_inclusive_OR_expression*

AND_inclusive_OR_expression ::=
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    logical_AND_operator inclusive_OR_expression

logical_AND_operator ::=
    '&&'
//#ifdef GNU
    |  'and' | 'AND' | 'ANDP'
//#endif

inclusive_OR_expression ::=
    exclusive_OR_expression OR_exclusive_OR_expression*

OR_exclusive_OR_expression ::=
    bit_OR_operator exclusive_OR_expression

bit_OR_operator ::=
    '|'
//#ifdef GNU
    |  'bit_or' | 'BIT_OR'
//#endif

exclusive_OR_expression ::=
    AND_expression exclusive_OR_AND_expression*

exclusive_OR_AND_expression ::=
    bit_XOR_operator AND_expression

bit_XOR_operator ::=
    '^'
//#ifdef GNU
    |  'bit_xor' | 'BIT_XOR'
//#endif

AND_expression ::=
    equality_expression AND_equality_expression*

AND_equality_expression ::=
    bit_AND_operator equality_expression

bit_AND_operator ::=
    '&'
//#ifdef GNU
    |  'bit_and' | 'BIT_AND'
//#endif

equality_expression ::=
    relational_expression equality_relational_expression*

equality_relational_expression ::=
    equality_operator relational_expression

equality_operator ::=
        '==' | '!='
    |   'equals'        //% Mozilla FF

relational_expression ::=
    shift_expression relational_shift_expression*

relational_shift_expression ::=
    relational_operator shift_expression

relational_operator ::=
    '<' | '>' | '<=' | '>='

shift_expression ::=
    additive_expression shift_additive_expression*

shift_additive_expression ::=
    shift_operator additive_expression

shift_operator ::=
    '<<' | '>>'

additive_expression ::=
    multiplicative_expression add_subtract_multiplicative_expression*

add_subtract_multiplicative_expression ::=
    additive_operator multiplicative_expression

additive_operator ::=
    '+' | '-'

multiplicative_expression ::=
    cast_expression multipy_divide_cast_expression*

multipy_divide_cast_expression ::=
    multiplicative_operator cast_expression

multiplicative_operator ::=
    '*' | '/' | '%'
//#ifdef GNU
    |  'div' | 'DIV' | 'mod' | 'MOD'
//#endif

cast_expression ::=
    cast_operator* unary_expression

cast_operator ::=
        '(' type_name ')'
    |   upper_macro_name "{^}"postfix_expression    //% observed - JRC

unary_expression ::=
    pre_increment_decrement_operator* sub_unary_expression

pre_increment_decrement_operator ::=
    '++'  | '--'

sub_unary_expression ::=
    postfix_expression
    |   unary_operator  cast_expression
    |    sizeof_expression
    |    alignof_expression

unary_operator ::=
    '&' | '*' | '+' | '-' | '~' | '!'
//#ifdef GNU
    |   '&&' | 'not' | 'NOT'
//#endif

sizeof_expression ::=
    'sizeof' '(' type_name ')'
    |   'sizeof' unary_expression

alignof_expression ::=
    alignof_specifier '(' expression_or_type_name ')'

alignof_specifier ::=
    '_Alignof'
//#ifdef GNU
    |   '__alignof' | '__alignof__' | 'alignof__'
//#endif

postfix_expression ::=
    primary_expression  postfix_extension*

primary_expression ::=
    identifier
    |   constant
    |   string_literal
    |   parenthesized_expression
    |   constructor_expression
    |   generic_selection
    |   macro_call

constructor_expression ::=
        '('  type_name ')' compound_initializer

identifier ::=
    id
//#ifdef LINUX
    |   foreachid | whileeachid | gnuextensionid
//#endif

parenthesized_expression ::=
        '('  expression_list ','? ')'
//#ifdef GNU
    |   '('  compound_statement ')'
//#endif

generic_selection ::=
    '_Generic' '(' assignment_expression ',' generic_assoc_list ')'

generic_assoc_list ::=
    (generic_association ',')+

generic_association ::=
    type_name ':' assignment_expression
    |   'default' ':' assignment_expression

postfix_extension ::=
    subscript_extension
    |   arguments_extension
    |   field_access_extension
    |   dereference_extension
    |   post_increment_decrement_operator

subscript_extension ::=
    '['  assignment_expression?  ']'

field_access_extension ::=
    '.' identifier

dereference_extension ::=
    '->' identifier

post_increment_decrement_operator ::=
    '++' | '--'

arguments_extension ::=
     '('  argument_expression_list?
//#ifdef LINUX
    variadic_declaration_spec?
//#endif
//#ifdef GNU
    dotdot?
//#endif
     ')'

argument_expression_list ::=
    (argument_expression ',')+

variadic_declaration_spec ::=
    macro_name

dotdot ::=
     '..'

argument_expression ::=
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
    assignment_expression
//#ifdef PREPROCESSOR
    preprocessor*
//#endif
//#ifdef LINUX
    |   equality_operator | relational_operator
    |   upper_macro_name upper_macro_name*
//#endif

structured_statement_expression ::=
    structured_statement

constant ::=
    integer_constant
    |   floating_constant
//%   | enumeration_constant        //% already captured by identifier in primary
    |   character_constant

integer_constant ::=
    number
//#ifdef LINUX
    number_units?
//#endif
    |   longnumber
    |   hexnumber

number_units ::=
    'KB' | 'MB' | 'GB'

floating_constant ::=
    floatnumber
    |    dotfloatnumber     //% TXL doesn't' defaultly space before .
    |   hexfloatnumber

character_constant ::=
    charlit

string_literal ::=
    stringlit string_unit*  //% Includes implicit concatenation
//#ifdef GNU
    |   pseudo_string stringlit string_unit*
//#endif

string_unit ::=
    stringlit
//#ifdef GNU
    |   pseudo_string
    |   register_spec
//#endif
//#ifdef PREPROCESSOR
    |   preprocessor
//#endif

pseudo_string ::=
    pseudo_string_name pseudo_string_arguments?

pseudo_string_name ::=
    identifier

pseudo_string_arguments ::=
    '(' pseudo_string_argument_list ')'

pseudo_string_argument_list ::=
    (pseudo_string_argument ',')*

pseudo_string_argument ::=
    constant_expression
    |   register_spec

register_spec ::=
    '%'  '%'? identifier
    |   '%'  '%'? integernumber id?

gnu_attributes ::=
    gnu_attribute_spec+

gnu_attributes_or_asm ::=
    gnu_attribute_spec
    |   asm_spec

gnu_attribute_spec ::=
    'attribute_hidden'
    |    gnuextensionid gnu_attribute_arguments?
    |   upper_macro_name '(' stringlit ')' "{!}"upper_macro_name

gnu_attribute_arguments ::=
    '(' gnu_attribute_argument+ ')'

gnu_attribute_argument ::=
    '(' gnu_attribute_argument* ')'
    |   "{!}"'(' "{!}"')' token_or_key

token_or_key ::=
    token | key

machinestart_declaration ::=
    machine_start_ '(' expression_list ')'
    (sub_initializer ',')* ','?
    'MACHINE_END' ';'?

machine_start_ ::=
    'MACHINE_START' | 'DT_MACHINE_START'

macro_declaration_or_statement ::=
    'else'? //% observed httpd
    macro_call macro_extension? ';'?
    |   macro_block ';'?
    |   macro_name ';'? "{^}"statement_declaration_or_end

macro_block ::=
    macro_name compound_statement macro_call? ';'?      //% try {   } catch(err);

macro_extension ::=
        macro_initializer
    |   enumerator_body
    |   compound_statement
    |   NL_stringlit+       //% observed Linux - JRC

macro_initializer ::=
    '=' initializer
    |   "{^}"'{' compound_initializer

NL_stringlit ::=
     stringlit

macro_call ::=
        macro_name '(' macro_arguments ')'

macro_arguments ::=
    macro_argument*

macro_argument ::=
    '(' macro_arguments ')'
    |   '{' macro_arguments '}'
    |   "{!}"'(' "{!}"')' "{!}"'{' "{!}"'}' token_or_key

macro_name ::=
        identifier

upper_macro_name ::=
    upperid | upperlowerid

statement_declaration_or_end ::=
    statement | declaration | '}' | empty

unknown_declaration_or_statement ::=
    unknown_item+ semi_or_end_scope

semi_or_end_scope ::=
    semi
    |   "{^}"'}'

unknown_item ::=
    '{' unknown_item* '}'
    |   "{!}"';' "{!}"'{' "{!}"'}' token_or_key
CordyJ commented 5 months ago

Would be nice if txl could also output an EBNF understood by https://github.com/GuntherRademacher/rr to generate nice navigable railroad diagrams to help document/develop/debug the grammars (see a manuall extraction example bellow).

You could make such a tool, in TXL. You just need a transformation from TXL grammar notation to EBNF, or anything else you want.

mingodad commented 5 months ago

When and if I manage to understand it !

CordyJ commented 5 months ago

I noticed that there is a lot of repetitions of tree.trees (grammarTP).kind the Turing+ compiler can opitmize all of that array access or they are seeking the tree.trees every time it appear ?

First, sections of the OpenTXL source code that do not affect overall performance are not optimized since they make no difference. In OpenTXL, all of the parser time goes into the five main instructions of the parse loop, and all of the transformer time goes into the five main instructions of the transform loop. Changing anything else will not change performance.

That being said, the Turing+ code tree.trees(grammarTP).kind is compiled into the global array reference tree_trees[grammarTP].kind in C, which the C compiler optimizes automatically. Even if it did not, modern hardware does so at run time all by itself.

CordyJ commented 5 months ago

When and if I manage to understand it !

Yes. It's exactly the kind of thing that TXL is designed for. As an aside, learning TXL might be a better start than examining its source code - you can't learn anything about it that way.