Closed codemanyak closed 7 years ago
The decisive flaw of the draft outlined above is that it makes sensible code generation to typed languages practically impossible. Due to the possibility to overwrite a variable with a value of a completely different type, due to the possibility that a subroutine might add further components to a record variable passed in as argument (call by reference!), there would be no chance to identify the exact record structure of a variable by syntactical (static) analysis. This is lightyears worse than with arrays. But what's the wayout here? To force a strict explicit declaration with some record type definition, and prevent any re-typing assignment, any runtime addition of components? Seems that way, unfortunately.
This sounds like a very good plan. And I think a "fixed" record layout (only available if explicit declared) is both "structured" and help Executor and Code Generators.
The internal implementation would sensibly be done by key-value pairs, i.e. as a map where the component names are the keys. This means you cannot conclude the exact physical order and number of the components.
With the changed rule "records need to be declared beforehand" this shouldn't be an issue any more when using a LinkedHashMap.
Once deduced that a declaration is inevitable, the next arising question is how type compatibility is to be defined: by name or by structure? With a named type compatibility approach (like in Pascal) we must introduce type definitions.
Without (named) type definitions i.e. a mere structural approach (like in COBOL), complex types will have to be constructed for every variable again and again. And not even C allows an assignment between varables of unnmamed struct types, even if they are structurally congruent. So this is not actually a viable alternative to type definitions: On export, it's no problem to derive code that doesn't know type definitions (like COBOL) from type definitions in contrast to the other way round. What would be the implications of a type equivalence model by name for Structorizer, now?
What would be the consequences for Executor?
Especially because of the type definitions may become a quite long list and someone wants to "outsource" or remove some of them easily (may even have alternatives during design phase and deactivating the alternatives by deactivating their instructions): please leave the type definitions as instructions. The same instructions would be usable in IMPORT diagrams and normal (sub)programs and may be mixed with "normal" declarations (otherwise all type of declarations must be moved, which isn't even possible because of backwards compatibility...). The rest sounds well.
For the type definitions, I propose the following syntax variants, which are close enough to Pascal, C, and Basic (with "type" as a keyword similar to "var" and "as") and to the parameter lists of subroutine diagrams (untyped components might be tolerated while they are not meant to be used for records themselves):
type MyType = record{ comp1, comp2: int; comp3: double; comp4 }
or
type MyType = record{ comp1, comp2 as int; comp3 as double; comp4 }
or
type MyType = struct{ int comp1, comp2; double comp3; comp4[;] }
The above specifications are to be regarded as equivalent. The semicolon after the last component (required in C and Java) should be optional here.
I'm a little bit confused. Why adding a type
definition which declares a record?
I think the following would be usable.
A) plain type definition:
type MyType = { int comp1, char[55] text; double comp3; field[;] }
B) declaring a record using this type definition (not initialized / initialized; maybe removing struct/record completely [implied by using a type] and use "var" there, too):
var somevar as int = 55
struct MyType MyStruct
struct MyType MyStruct2 {somevar, "123", 123456489, "some string"}
C) declaring a record with internal type definition only applied to this record (not initialized / initialized; maybe removing struct/record completely [implied by using a type] and use "var" there, too):
var somevar as int = 55
struct MyStruct { int comp1, char[55] text; double comp3; field[;] }
struct MyStruct2 { int comp1 = somevar, char[55] text = "123"; double comp3 = 123456489; "some string"[;] }
...
Not sure about using type as all, maybe just using record and struct?
@GitMensch
A) plain type definition:
type MyType = { int comp1, char[55] text; double comp3; field[;] }
This is exactly the same as the third syntax variant of my proposal, just without the keyword record or struct (which I think are useful for readability and instant understandability) and with a wrong separator between the first two components (should be a semicolon, since they are of different types).
B) declaring a record using this type definition (not initialized / initialized; maybe removing struct/record completely [implied by using a type] and use "var" there, too):
var somevar as int = 55
struct MyType MyStruct
struct MyType MyStruct2 {somevar, "123", 123456489, "some string"}
My proposed and (in the first case already implemented) syntax would be:
var somevar as int <- 55
var myStruct: MyType
or var MyStruct as MyType
as mere declaration.
MyType myStruct2 <- {comp1: somevar, text: "123", comp3: 123456789, field: "some string"}
or var myStruct2: MyType <- {comp1: somevar, text: "123", comp3: 123456789, field: "some string"}
in case of a typed assignment or initialised declaration.
If the variable had been declared then a simple assignment would look like this:
myStruct2 <- {comp1: somevar, text: "123", comp3: 123456789, field: "some string"}
.
I don't regard it as helpful if every reference to a record/struct type must be marked with a struct
keyword (like in original C). I cling rather to the more general idea of a defined type name that may mean anything (a record, an array, some scalar stuff like in Pascal, C++, Java etc.) as far as it had been defined properly.
C) declaring a record with internal type definition only applied to this record (not initialized / initialized; maybe removing struct/record completely [implied by using a type] and use "var" there, too)
Okay with the replacement of "struct" by "var" in THESE positions (see B). But I thought I had made clear that this (very C-like, beside) implicit declaration style (anonymous types) may work in Executor but would confront the code generators with unsolvable type tracking problems. The problem arises with subsequent assignments among variables, parameter passing etc., which cannot actually be tracked through alternatives, loops and subroutines by the generators or would require structure comparison with weak component types. To allow these syntactic variants would mean to prohibit assignments of the entire variable to another one because this way the type would no longer be confined to the original record. I would not like to impose such a difficult-to-understand "underprivileged" kind of variable. Your last example looks convenient but melts together variable declaration, implicit type definition, and component initialization. It is not even allowed in C, by the way. (Structorizer should not end like COBOL in my eyes.) I would like to adhere to more separate concepts like the ones known from Pascal. I think, Pascal is a good guideline for structograms. I simply wanted to avoid an END keyword and modified the record type definition draft by using braces instead.
I see. Did you committed the code already?
No, I haven't. I'm still working on the necessary changes of the type map design. And I couldn't spend so much time on it since I had got a pile of other work to do, recently. But I'll continue as soon as possible.
@codemanyak Is there an update in sight? Is the target "finish Milestone 3.27 before your vacation" still there (and does it include an roughly working import of COBOL sources)?
I still hope so.
A first prototype supporting record types involves Executor and Analyser (including Import), possibly with some flaws or bugs. It is just committed und could be tested. I changed the record initializer syntax specification in a way that the type name is to be used as immediate prefix of the opening brace. Generators and Parsers are going to be addressed next. Herer are some example diagrams.
DateImport.zip should be renamed in DateImport.arrz: DateImport.zip
After some corrections, a new, record-based version of a binary search tree program is working (rename file "BinSearchTree423.zip" to "BinSearchTree423.arrz" in order to load it):
The figures show two of the four contained diagrams to give an impression:
Looks quite fine. Do you see anything other than the code-generators/parsers to do for being able to close this issue?
Well, to be honest: I haven't managed to code this enhancement as clean and modular as I think it ought to be. The need to integrate it in the existing code induced some foul compromises. Once it will be necessary to redesign the many ad-hoc syntax analysis patches fundamentally, though, as a well-structured low-redundance syntax toolbox. That would have been too big an issue for now. So it's likely that different irregularities, limitations, and deficiencies will show in practical use. Apart from that, it's indeed mostly the code generators / parsers that are to be done. I will work on them one by one this month, starting with Pascal and C. COBOL parser / generator are likely to be the last ones I'll pick.
Code generator tasks (Pascal accomplished today):
Code import tasks:
My first Pascal export draft didn't dare to produce structured constant definitions and made some efforts to circumvent them, So this can be simplified a lot.
Pascal export revised, now converts structured Structorizer constants to structured Pascal constants.
Adaptation of CGenerator done (first approach).
Adaptation of C++ generator, CGenerator and Explorer (FOR-IN loops over arrays of records) mended.
Java export enhanced for record types, several minor fixes with input and output instruction export, and declaration handling.
CParser enabled to cope with most typedefs and struct definitions. CGenerator handling of struct types also revised such that export and import now work in a complementary way.
Python generator enabled to export recordtype definitions and record initializers.
Type definitions and variable declarations as well as variable access via fully qualified names are now generated based on CobTools. Still not adressed is association of array indices to the correct hierarchy level for the accessor and assignment strings.
Oberon generator enabled to export records
@codemanyak I'm a little bit puzzled by the generation and the "%num" parts - What do they mean?
Sample:
01 ZV-REC-IO.
03 ZV-stuff.
05 ZV-stuff-a PIC 9(08).
03 ZV-DATA-IO.
05 ZV-GEMKZ PIC X(05).
05 ZV-ART PIC 9(01).
05 ZV-INSTITUT PIC 9(01).
05 ZV-ZW PIC 9(02).
move WS-GEMKZ to ZV-GEMKZ
move 3 to ZV-INSTITUT
move 1 to ZV-ART
with the result
@GitMensch The index placeholders "[%1]" shouldn't be there in this example. They are only to be generated if an OCCURS clause occurred or an index variable was associated. They are then intended to be matched against index or subscript expressions. So they should never be seen by a user's eye. Where is your code snippet from?
The code snipped is from real code and was generated from bugfix branch. Can you reproduce it with this part only (if not I'll try to get a minimal reproducible version tomorrow)?
Please note that the malfunction you reported here is a mere COBOL import issue (so it's misplaced here).
Moreover, I can't reproduce it without additional context (I used the following code and obtained a sensible diagram):
Though until next week I will hardly find any time to work on these issues.
I've created an independent issue with sample code. Can you please check if you can reproduce it?
As already mentioned in other issues, a support for heterogeneous data structures with named components (called "record" in Pascal-like languages or "struct" in C-like languages) is proposed. The support shall comprise
The record concept is intended to work in a similarly incremental way as arrays do in Structorizer, (i.e. you may add components at run time).
date.month
. Component names are to comply with identifier syntax (sequence of letters, digits, underscores, not beginning with a digit).foo.bar <- 17
is only allowedif variableif it is a record variable already.foo
hasn't existed before orIf variable "foo" hasn't been defined before, then an assignmentfoo.bar <- 17
is to create variablefoo
as a record variable containing an integer componentbar
.foo
had been existing as record variable then either the value of existing componentbar
is updatedor a new component.bar
is addedfoo
is a constant then any assignment to a component is illegal.foo.bar
is only legal iffoo
is a record variable with an (initialized!) componentbar
.today <- {year: 2017, month: 6, day: 26}
. If it makes syntax analysis more feasible, a prefix keyword likerecord
(EDIT: or rather the name of the defined type) might be prescribed (i.e.record{year: 2017, month: 6, day: 26}
).const beginOfEra <- {year: 1970, month: 1, day: 1}
yesterday <- today
), the following rules seem sensible:no matter what structure or value it had had before.might get difficult to guarantee with the Executor interpreter, but itshould meet the same expectations the array assignment provides).The internal implementation would sensibly be done by key-value pairs, i.e. as a map where the component names are the keys. This means the user cannot conclude the exact physical order and number of the components.