Add import from C - Githubissues

GitMensch commented 7 years ago

Moved from #354.

Comments from 2017-03-06

@codemanyak:

An early prototype with ANSI-C import could be downloaded from codemanyak/Structorizer.Desktop master...

@GitMensch:

Thank you for the update.

First try:

Parser-error Syntax Error: const at line 28

console:

no_argument
required_argument
optional_argument
s
s
gpMsgTokenRead: static
gpMsgTokenRead: int
gpMsgReduction
<Mod> ::= static
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: arg_shift
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: =
gpMsgReduction
<Array> ::= 
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Var> ::= Id <Array> = <Op If>
gpMsgReduction
<Var List> ::= 
gpMsgTokenRead: static
gpMsgReduction
<Var Decl> ::= <Mod> <Type> <Var> <Var List> ;
gpMsgTokenRead: int
gpMsgReduction
<Mod> ::= static
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: print_runtime_wanted
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: =
gpMsgReduction
<Array> ::= 
gpMsgTokenRead: 0
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= OctLiteral
gpMsgReduction
<Var> ::= Id <Array> = <Op If>
gpMsgReduction
<Var List> ::= 
gpMsgTokenRead: static
gpMsgReduction
<Var Decl> ::= <Mod> <Type> <Var> <Var List> ;
gpMsgTokenRead: const
gpMsgSyntaxError

source (not at line 28, but on line 46, I see no option to get to this line counter)

static const char short_options[] = "+hirc:VqM:";

and a second sample run

Parser-error Syntax Error: ( at line 104

console:

gpMsgTokenRead: int
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseFileFormat
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: format
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: format
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "F"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FILE_TYPE_FIXED
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: format
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "V"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FILE_TYPE_VARIABLE
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: }
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: int
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseFileOrganization
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: organization
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: organization
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "IX"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FILE_ORGANIZATION_INDEXED
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: organization
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "RL"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FILE_ORGANIZATION_RELATIVE
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: organization
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "SQ"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FILE_ORGANIZATION_SEQUENTIAL
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: organization
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "LS"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FILE_ORGANIZATION_LINESEQUENTIAL
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: }
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: int
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseFieldType
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: type
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: type
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "CH"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FIELD_TYPE_CHARACTER
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: type
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "BI"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: FIELD_TYPE_BINARY
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: }
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: int
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseFieldValueType
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: type
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: switch
gpMsgTokenRead: (
gpMsgTokenRead: type
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: {
gpMsgTokenRead: case
gpMsgTokenRead: 'Z'
gpMsgTokenRead: :
gpMsgReduction
<Value> ::= CharLiteral
gpMsgTokenRead: return
gpMsgTokenRead: FIELD_VALUE_TYPE_Z
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: case
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: 'X'
gpMsgTokenRead: :
gpMsgReduction
<Value> ::= CharLiteral
gpMsgTokenRead: return
gpMsgTokenRead: FIELD_VALUE_TYPE_X
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: case
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: 'C'
gpMsgTokenRead: :
gpMsgReduction
<Value> ::= CharLiteral
gpMsgTokenRead: return
gpMsgTokenRead: FIELD_VALUE_TYPE_C
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: default
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: :
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgReduction
<Case Stms> ::= default : <Stm List>
gpMsgReduction
<Case Stms> ::= case <Value> : <Stm List> <Case Stms>
gpMsgReduction
<Case Stms> ::= case <Value> : <Stm List> <Case Stms>
gpMsgReduction
<Case Stms> ::= case <Value> : <Stm List> <Case Stms>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= switch ( <Expr> ) { <Case Stms> }
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: int
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseSortDirection
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: direction
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: direction
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "A"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: SORT_DIRECTION_ASCENDING
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: direction
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "D"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: SORT_DIRECTION_DESCENDING
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: }
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: int
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseCondCondition
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: condition
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: condition
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "EQ"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_CONDITION_EQUAL
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: condition
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "GT"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_CONDITION_GREATERTHAN
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: condition
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "GE"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_CONDITION_GREATEREQUAL
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: condition
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "LT"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_CONDITION_LESSERTHAN
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: condition
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "LE"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_CONDITION_LESSEREQUAL
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: }
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: int
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: utils_parseCondOperation
gpMsgReduction
<Scalar> ::= int
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgReduction
<Func ID> ::= <Type> Id
gpMsgTokenRead: const
gpMsgTokenRead: char
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: operation
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: )
gpMsgReduction
<Param> ::= const <Type> Id
gpMsgTokenRead: {
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: operation
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "AND"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_OPERATION_AND
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: if
gpMsgTokenRead: (
gpMsgTokenRead: !
gpMsgTokenRead: strcasecmp
gpMsgTokenRead: (
gpMsgTokenRead: operation
gpMsgTokenRead: ,
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: "OR"
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= StringLiteral
gpMsgReduction
<Expr> ::= <Expr> , <Op Assign>
gpMsgTokenRead: )
gpMsgReduction
<Value> ::= Id ( <Expr> )
gpMsgReduction
<Op Unary> ::= ! <Op Unary>
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: COND_OPERATION_OR
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= Id
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: else
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgTokenRead: {
gpMsgTokenRead: return
gpMsgTokenRead: -
gpMsgTokenRead: 1
gpMsgTokenRead: ;
gpMsgReduction
<Value> ::= DecLiteral
gpMsgReduction
<Op Unary> ::= - <Op Unary>
gpMsgTokenRead: }
gpMsgReduction
<Normal Stm> ::= return <Expr> ;
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: }
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm> ::= if ( <Expr> ) <Then Stm> else <Stm>
gpMsgReduction
<Stm List> ::= 
gpMsgReduction
<Stm List> ::= <Stm> <Stm List>
gpMsgTokenRead: const
gpMsgReduction
<Block> ::= { <Stm List> }
gpMsgReduction
<Func Decl> ::= <Func ID> ( <Params> ) <Block>
gpMsgTokenRead: char
gpMsgReduction
<Mod> ::= const
gpMsgReduction
<Sign> ::= 
gpMsgTokenRead: *
gpMsgReduction
<Scalar> ::= char
gpMsgReduction
<Base> ::= <Sign> <Scalar>
gpMsgTokenRead: utils_getFileFormatName
gpMsgReduction
<Pointers> ::= 
gpMsgReduction
<Pointers> ::= * <Pointers>
gpMsgReduction
<Type> ::= <Base> <Pointers>
gpMsgTokenRead: (
gpMsgSyntaxError

source (not at line 104, but on line 102)

const char *utils_getFileFormatName(int format) {

Looks like the grammar has problems with const entries.

@codemanyak:

The uploaded version was also defective on do-while loop s and switch statements. I'm just fixing it. I will then look for the const problem. It would be helpful if you could upload me the two C files. Astonishingly the grammar contains the keywords const as modifier and char as type. This is surprising because otherwise the grammar rather refers to a very early C version (more or less 1973 code). W.r.t. the drag and drop support: In theory it should have worked for Pascal files and I will definitely try to make it work for all file extensions registered with the import plugins.

@GitMensch:

Sounds good. Sample 1: https://sourceforge.net/p/open-cobol/code/HEAD/tree/branches/gnu-cobol-2.0/bin/cobcrun.c, Sample 2 https://svn.code.sf.net/p/open-cobol/contrib/trunk/tools/ocsort/utils.c

Question: are these problems with the gold grammar or problems in the Structorizer parts?

More comments from https://github.com/fesch/Structorizer.Desktop/issues/354#issuecomment-284559244 on ...

Comments from 2017-03-10

@codemanyak:

Well, I think I can present a very advanced C import now (branch codemanyak/Structorizer.Desktop/master). (Maybe some project configuratin files will have to be adapted to get it running.) The parsing error display is slightly improved and got a button to copy its content to the clipboard as requested. Here is a tested C source derived from your example with only slight modifications such that it passes the syntax check. I enhanced the C grammar a little to allow array/struct initializers (the ones with braces) and a single void in the parameter lists. One strange limitation of the grammar I haven't coped to lift so far: It does not accept user-defined type names!

https://github.com/fesch/Structorizer.Desktop/files/833018/cobcrun_Issue354.zip

In the code (parsers subfolder), you will also find the generated skeleton for the COBOLParser.

@GitMensch:

Thank you very much! The parsing error display has actually quite improved 👍 The "Expected" part is very nice, can you add a "Found: xyz" output to enable a quicker check for the exact problem without the need to source level debug it?

For the C grammar: One "error.syntax" in file https://svn.code.sf.net/p/open-cobol/contrib/trunk/tools/cobjapi/src_c/japilib.c which I see no reason for yet is:
Preceding source context:
  51:   /* function imported from fileselect.c */
  52:   extern char* __fileselect» (int,char*,char*,char*);

Expected: '[' | ',' | ';' | '='
Another issue: #ifndef currently leads to error.lexical. I assume this is some missing preparsing, is it? It would be nice to be able to set some precompiler options "from outside", for example simple defined values or paths for #include (a list of directories where includes should be checked for - if they aren't in there they simply won't be resolved).

I guess the C++ grammar doesn't have the user type limitation, you may be able to copy these parts to the C grammar (it is likely the most important regression for the current import).

Actually if there is a C++ grammar it maybe is a now easy task for you to add C++ next. It likely won't be able to parse stuff like https://sourceforge.net/p/open-cobol/code/HEAD/tree/branches/gnu-cobol-cpp/libcob/intrinsic.cpp but a simple hello.cpp or https://sourceforge.net/p/open-cobol/code/HEAD/tree/branches/gnu-cobol-cpp/cobc/error.cpp may work quite fast.

I'll likely start to inspect the COBOL parser more next week, these are the most important questions so far (I hope to see the answes resulting in src\lu\fisch\structorizer\parsers\howto , too : - )

What documentation do you use for the GOLDparser grammar format?

How are tokens in the grammar translated to the Java constants for the parser?

In general: How are we supposed to build the egt files after changing the grm?

What do we (manually?) need to change in the language specific parser after we changed and compiled the grammar? I guess it is only about adjusting the SymbolConstants and RuleConstants, correct (where do we get them from)?

How do we add a completely new grm->egt + parser (I guess adding the grm, then doing the compilation, then copy the templates, then do the adjusting mentioned before [and add the code of course])

Should the preprocessing resolve includes (in the case of COBOL COPY file. statements) from a given list of directories (something a language specific preparser normally does)?

How to set rules for the import from "outside"? The main two parts are: options for the preprocessing (in the COBOL case it would be: options for reference-format [fixed-form or free-form, for the former: code area start/end - everything before/after is kind of a comment], inline comment marker [there are vendor specific extensions...] and in general the directory list for includes mentioned before)

@GitMensch:

Found some regression in the C parser that is only shown in the error console when trying to parse https://svn.code.sf.net/p/open-cobol/code/branches/gnu-cobol-2.0/libcob/cobgetopt.c (without the idea that it would be parsed without errors, just to recheck where the parsing ends - and didn't expect the console):
CParser.prepareTextfile() -> Unclosed group near index 79
(.*?\W)NONOPTION_P (argv[cob_optind][0] != '-' || argv[cob_optind][1] ==(\W.*?)
                                                                               ^
Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException
  at lu.fisch.structorizer.parsers.CodeParser.parse(CodeParser.java:167)
  at lu.fisch.structorizer.gui.Diagram$1.filesDropped(Diagram.java:399)
  at net.iharder.dnd.FileDrop$1.drop(FileDrop.java:328)
  at java.awt.dnd.DropTarget.drop(Unknown Source)
The code that seems to be processed is
#define NONOPTION_P (argv[cob_optind][0] != '-' || argv[cob_optind][1] == '\0')
Does the parser stumble over '\0'?

Ideally this error is catched and shown in the new parser error message box first and afterwards the import is fixed.

More comments from https://github.com/fesch/Structorizer.Desktop/issues/354#issuecomment-285670846 on ...

GitMensch commented 7 years ago

Question to the C Parser (before I include this with a merge request): Are you OK with me changing

-       final String[] exts = { "c" };
-           interm = File.createTempFile("Structorizer", ".c");
+       final String[] exts = { "c", "h" };
+           interm = File.createTempFile("Structorizer", "." + getFileExtensions()[0]);

codemanyak commented 7 years ago

@GitMensch

Question to the C Parser (before I include this with a merge request): Are you OK with me changing

Seems sensible. Go ahead then.

GitMensch commented 7 years ago

Minimal C Parser changes added as part of https://github.com/codemanyak/Structorizer.Desktop/pull/77

codemanyak commented 7 years ago

CParser should cope now with the majority of usual typedefs.

GitMensch commented 7 years ago

Just did some parsing for checking the actual status. Some thing that should be easy which I not got to work with the first approach are nameless parameter in function prototypes. The current one:

<Func Proto> ::= <Func ID> '(' <Types>  ')' ';'
               | <Func ID> '(' <Params> ')' ';'
               | <Func ID> '(' void ')' ';'
               | <Func ID> '(' ')' ';'

<Params>     ::= <Param> ',' <Params>
               | <Param>

<Param>      ::= <ConstType> ID <Array>

always needs an ID which I think is not correct, if we want to support this we should support the version without ID, too. Creating and using a <ProtoParams> without a ID leads to a Reduce-Reduce conflict. Note: there's already a Shift-Reduce conflict in this area...

I think the mentioned issue is the reason for the following parser error (I may be wrong)

static int      worldcities2_ (const int);

Found token '('

Expected: '[' | ',' | ';' | '='

GitMensch commented 7 years ago

Another question to the C parser: Do we want to add the token NULL (currently not in and therefore tokenized as Id) to <Value>?

As we don't parse any system headers - do we want to add standard typedefs/structs "somewhere" (either preparser or grammar)?

Things like size_t, time_t, FILE, ... otherwise we will have many files that cannot be parsed. Or make a brute-force grammar and define Id as <Type>... just seen: we already have <User Type> in there - therefore I guess these things have to be added in the preparser, correct? If we externalize the list of "standard typedefs/structs" to an option dialog we can ship a standard list and people can add "their" entries themselves.

GitMensch commented 7 years ago

Note: I'm doing some work in the C preparser (concerning C preprocessor parts).

codemanyak commented 7 years ago

@GitMensch

Some thing that should be easy which I not got to work with the first approach are nameless parameter in function prototypes. The current one always needs an ID which I think is not correct, if we want to support this we should support the version without ID, too.

Just formally: What do you think the first rule <Func Proto> ::= <Func ID> '(' <Types> ')' ';' is good for? (Well, maybe it doesn't work due to some shift-reduce conflict....) Besides this, I personally regard prototypes without parameter ids (like void xyz(int, int, int, int, int)) as pretty useless, particularly in public header files. They convey poor intelligible information. (As a code-file internal forward declaration they may be acceptable.) But nevertheless you are right saying that the C standard allows them. (And in some weird communities they are even the quasi standard.) But if I have to risk reduce conflicts then I tend to live with the gap. Users might simply add ids or comment out the prototype for the import as it isn't used for building the diagram(s), anyway.

I think the mentioned issue is the reason for the following parser error (I may be wrong) static int worldcities2_ (const int);

I'd assume you are wrong because the opening parenthesis is complained. This means the parameter list will hardly have been inspected (LALR(1) means single token lookahead, as you perfectly know).

codemanyak commented 7 years ago

Do we want to add the token NULL (currently not in and therefore tokenized as Id) to ?

I didn't regard NULL as a problem because it is not a type identifier and passes the parsing process, therefore, or doesn't it? It would just be marked as no initialised by Structorizer. But an execution of a diagram requiring pointers would fail, anyway.

Things like size_t, time_t, FILE, ... otherwise we will have many files that cannot be parsed.

With the standard library types it's different, you are right. I think it will make sense to define them in the grammar (as I already did with ´wchar_t´, which is a quasi-reserved word, though).

codemanyak commented 7 years ago

I just tried to import a header file with function prototypes like void test(int, int, double, float*);: no problem for the parser at all. So the grammar is fine with this respect. EDIT: I added the incriminated line static int worldcities2_ (const int); to my header file, too, and again it passed both parser and builder, not producing any elements, of course.

GitMensch commented 7 years ago

The offending source is the following: worldcities2.c.txt

Note: My C preparser changes are only rough tested and therefore I did no new pull request, yet. You can have a look at them if you like to (or even add the merge request/test yourself). https://github.com/GitMensch/Structorizer.Desktop/commit/da310f2f9cc31904e0c50206f6efc9cc8d3ed3f7 - please comment the commit as I've just started Java coding and code review is always nice

I'm away for some days and likely won't do something in the next days (maybe no coding for the next two weeks).

codemanyak commented 7 years ago

@GitMensch At least one thing is clear now. In the following code snippet from worldcities2.c, it's not the second but te first line (the one starting with __declspec(dllexport)), which fools the parser - a #define macro again, it seems, the definition of which apparently proceeds from an #include and hence is not visible. If you comment it out then you get past it. The next parse errors occurring are related to the type names cob_u8_t etc., which will have to be mapped to user-defined type names by the preprocessor. Then the function prototypes all pass the grammar.

__declspec(dllexport) int       worldcities2 (void);
static int      worldcities2_ (const int);
static int      checkfilestatus_0__ (cob_u8_t *, cob_u8_t *);
static int      checkfilestatus_0_ (const int, cob_u8_t *, cob_u8_t *);
static int      techtonics_0__ (cob_u8_t *, cob_u8_t *);
static int      techtonics_0_ (const int, cob_u8_t *, cob_u8_t *);

I'll try to define the user_type_### symbols in a more flexible way in the grammar (e.g. USER_TYPE_NAME = usertype[0-9][0-9][0-9]). Your idea to allow a user configuration for type names is certainly a good workaround.

GitMensch commented 7 years ago

The __declspec(stuff) is an attribute (in this case telling win32 linkers to export the symbol, dllimport would be the other way around), telling the linker to search for the function in a dll. These are highly system specific... A wide list of different attributes can be found in the Clang docs. The following layout is possible: PRE-ATTRIBUTE1 PRE-ATTRIBUTE2 TYPE FUNC-NAME (PARAM1, PARAM2) POST-ATTRIBUTE1 POST-ATTRIBUTE2;

Two function definitions that are used in GnuCOBOL, first with the win32 definition, then the GCC one:

__declspec(noreturn)    __declspec(dllexport)   void    cob_stop_run    (const int);
void    cob_stop_run    (const int) __attribute__((noreturn));

extern  void    cob_runtime_error   (const char *, ...);
extern __attribute__ ((visibility("hidden")))   void    cob_runtime_error   (const char *, ...) __attribute__((format(printf, 1, 2)));

For an NSD all these attributes can be considered noise - if possible: let the parser pick them and throw them away. Or maybe even better: let the preparser kick them out.

GitMensch commented 7 years ago

Did you had any chance to review https://github.com/GitMensch/Structorizer.Desktop/commit/da310f2f9cc31904e0c50206f6efc9cc8d3ed3f7 yet?

codemanyak commented 7 years ago

Am 25.05.2017 um 13:57 schrieb Simon Sobisch:

Did you had any chance to review https://github.com/GitMensch/Structorizer.Desktop/commit/da310f2f9cc31904e0c50206f6efc9cc8d3ed3f7 yet?

I had a quick look at it but for a review (even a quick one) I was lacking time.

codemanyak commented 7 years ago

@GitMensch Of course I respect your vacations. So if you don't answer immediately, no problem. I'm trying to understand your proposed code changes, though, and I've got some questions. So I better ask now while you may still know what you intended.

Your commit comment says:

fixed functions without parameters func (void) {}

What exactly was the problem on parsing functions without parameters? I had already fixed it in the grammar itself, and all my tests had worked fine. Moreover, I didn't see anything in the code that might have to do with it, unless your modification of the voidCastPattern was to address it? But I don't understand the way you changed it, either: You replaced the final non-greedy group (.*?) by the greedy (.+). Why? Greedy groups used to be a problem for the replacement if the pattern occurs several times in a text. And why do you require at least one character after the cast? To make sure it's a cast? This is already ensured by the requirement of a non-identifier character left of it, I thought. EDIT: O, now I see it clearly: You didn't refer to functions with empty parameter list at all but exactly to the void cast fix, which you think I had spoilt by putting the wrong group number in the replacement pattern? But no, there aren't three groups in it. My replacement pattern was correct. The apparent central group isn't a group but matches the parentheses of the cast.

Your modification of the definePattern from "^#define\\s+([\\w].*)\\s+(.+)" to "^#define\\s+([\\w].*)\\s*(.+)?" also looks problematic (not to say dangerous) in my eyes: You turned the final group into an optional one, which makes "$2" in the replacement pattern a dangling thing, such that the replacement of constructive defines doesn't work anymore.

But never mind, I think I understood the overall sketch of your appraoch and should be able to fix it. Thank you for the effort.

GitMensch commented 7 years ago

You didn't refer to functions with empty parameter list at all but exactly to the void cast fix, which you think I had spoilt by putting the wrong group number in the replacement pattern? But no, there aren't three groups in it. My replacement pattern was correct. The apparent central group isn't a group but matches the parentheses of the cast.

Yes, this was what I referred to. the void casts fix has eaten the (void) from func (void) {} during debugging and therefore the grammar passed an error for the not expected {: func >> {. If you think it works with the next version I don't care if anything of my changes stay in :-)

The modification of definePattern looks correct for me as you can define something to nothing. It is just defined and will be replaced by nothing during the preparsing, this is actually often used when you have multi-system code. The original definition for cob_stop_run mentioned above is

DECLNORET COB_EXPIMP void   cob_stop_run    (const int) COB_A_NORETURN;

and depending on the system type the defines get replaced (some always by "nothing").

codemanyak commented 7 years ago

@GitMensch

Yes, this was what I referred to. the void casts fix has eaten the (void) from func (void) {} during debugging

A, there is the rub! I hadn't tested with a space between function name and parenthesis. And my assumption that a greedy whitespace pattern \\s* between the \\W+? and the parenthesis would not fail to avert the matching with a preceding identifier was obviously wrong. Of course the matcher tries all possible matches and there is another match where \\W+? is satisfied by a whitespace sequence as well. Your change doesn't actually help, either, by the way, if a space or the brace follows the parameter list in the same line. After some experiments I now found the correct pattern (thanks for poking your finger into the wound): "(^\\s*|.*?[^\\w\\s]+\\s*)\$\\s*void\\s*\$(.*?)"

The modification of definePattern looks correct for me as you can define something to nothing.

No doubt about the define. I didn't question the attempt to detect and evaluate it. But the construction of the pattern isn't correct. Obviously an optional group doesn't work on replacement because the pattern also matches if the group is omitted, such that non-empty defines end up with an empty replacement. I'll find a better pattern. Thanks again.

GitMensch commented 7 years ago

Should I create a pull request with the current version I've committed to my branch and you merge it with maintainer edits as you did before? Or just forget about this branch and delete it?

codemanyak commented 7 years ago

I have already pulled it and begun to check with C sources.

codemanyak commented 7 years ago

@GitMensch Your differenciated #define evaluation was a really helpful appeoach I could accomplish now, function macros inclusively. With some minor manual preparation worldcities2.c is now importable. I provisionally added some COBOL compiler types like cob_8ut and other cob* something to the typedef list of CParser while the language-specific Parser configuration feature isn't ready.

Just four problems remain with worldcities2.c (which I circumvented by modifying the file): The Structorizer C grammar does not support the following constructs found in worldcities2.c:

Address operators applied to expressions, duplicate address operators in particular (seems to be an error in the source file - you may not apply the address operator to a temporary expression result): Example: frame_ptr->return_address_ptr = &&l_19;
Function pointer casting (the result wouldn't be supported by Structorizer, anyway, but maybe the grammar can be enhanced) Example: b_1 = ((int (*)(void *, void *))call_techtonics.funcint) (b_14, b_15);
Expressions as goto labels (this isn't supported by ANSI C, seems to be an error in the source file) Example: goto *frame_ptr->return_address_ptr;
Casting to const (maybe the grammar can be enhanced) Example: h_CITY_FILE->select_name = /*(const char *)*/"city-file";

All this disabled or replaced, the file can be converted into a small bunch of huge diagrams... Here's the importable modified C file (disguised as txt file): worldcities2a.c.txt

GitMensch commented 7 years ago

Last points first: 1 and 3 are GNU extensions for variable goto targets (something Structurizer can't support correctly anyway but it would be nice to don't let the grammar struggle there. 2 and 4: the pointer casts are only in there for removing compiler warnings, we just need to either suppress them from the parsing (or better: enhance the grammar and ignore them during the token processing). 4: yes, the grammar may be enhanced.

The usertype id change is marvelous - if we add a plugin option and/or parse #include "someinc" files to put them in the list this topic can be seen as finally been fixed (or has enough work-around),

I wonder if we should do something similar for COBOL constants, too (differentiating between numeric and non-numeric literals) - this would clean up the grammar but the parsing must push all 78 and 01 CONSTANTs into this list. Opinions?

GitMensch commented 7 years ago

Just tried an import of a generated C source to the current version:

error.syntax in file "C:\Users\simon\Documents\CBL_OC_DUMP.c"

Preceding source context:
   23:   __declspec(dllexport) int        CBL_OC_DUMP » (unsigned int *, unsigned int *);

Found token '('

Expected: '[' | ',' | ';' | '='

Note: While the #354 is much more important for me I thought it may be a good idea to note this. I guess it is still the function attribute causing issues.

fesch / Structorizer.Desktop

Add import from C #409