Add new language to export feature (COBOL)

GitMensch commented 7 years ago

This is an issue split from #354 and contains the

GitMensch:

I've found minimal documentation how to do the export - but is there a programmer's documentation how to add a new programming language to the export?

codemanyak:

Well, even the export functionality has become pretty complex, meanwhile. I might try to formulate a more or less simplified programmer's guide but that will take a little. [...] For the generator, I started to write a "howto". You will see the rough outline is rather straigthtforward. The more intriguing details (sections 4 through 7) are still to come, though. The howto file will eventually be placed in the source tree under ...structorizer.generators. howto.txt So you could already start with the fundamental stuff and then pass it over to me for the fine tuning of the export-option-aware details and to find the best way to perform certain tricky syntax conversion. I guess the COBOL expertise is your part, so I might ask you for specific aspects then?

GitMensch:

Specific export aspects: So far I only see the includes, they should be generated with COPY + includename + . and may need a GUI entry. But as those includes either contain variables - which I assume have to be defined in the NSD in any case or program code which needs to be defined in NSD for being able to reference it this may not be that a useful option - if it is added and it is possible to include any copybook it may be useful to say where in the generated source they are placed. Something like "WS: include1, include2; LS: include3; PD: include 4". Thoughts?

I suggest to start with the export as you have the howto already (and yes: placing it in-source-tree is definitely a good idea).

GitMensch commented 7 years ago

First questions, mainly @codemanyak :

I assume the export is always for complete programs, correct?
I did not find getCommentSymbolRight mentioned in the howto - while it sounds very reasonable. Is this a concept only, did I use the wrong sources (git clone from today, using master) or what did I miss?
optionBasicLineNumbering should be renamed, I suggest optionSourceLineNumbering (I'll use it as optional setting - it would be commentary only but I need to override addCode() in any case and some people like the "old school number your cards" style that was historically relevant for COBOL - is it OK if I refactor this name?
What kind of reserved words should be returned by getReservedWords()? If you take all dialects into account and include context-sensitive words (I don't think that the Generator has this concept, does it provide it, does it need it?) you get more than 580 reserved words - not counting the names of intrinsic functions, device names, ...
As it is important in COBOL where you put your includes to I'd not use Generator.insertUserInclude(String indent) but write and use COBOLGenerator.insertUserInclude(CodePart cp) with the following definition and a parsing of the GUI element like WS: data1.cpy, data2.cpy; LS: localdata.cpy; PD: helpers.cpy. Thoughts?

    public enum CodePart {

        WORKING_STORAGE("WS"), LINKAGE("LI"), PROCEDURE_DIVISION("PD");

        private String abbreviation;

        private CodePart (String abbreviation) {
            this.abbreviation = abbreviation;
        }

        public static CodePart getByAbbreviation(String abbreviation) {
            String inputString = abbreviation.toUpperCase();
            for (CodePart cp : CodePart .values()) {
                if (cp.abbreviation.equals(inputString)) {
                    return cp;
                }
            }
            return null;
        }

    }

Note: I'm not a Java coder yet so some implementation ideas may look strange - don't hesitate to suggest better ways.

codemanyak commented 7 years ago

@GitMensch

I assume the export is always for complete programs, correct?

You export an entire diagram, this may be a program or a routine (or an "includable"). If you need to have some framework around a routine then you will have to create it in case of a routine. Whether the diagram is a program or routine (or includable) can be tested via the public methods boolean Root.isProgram(), boolean Root.isSubroutine(), and boolean Root.isInclude(), respectively. With option "involve called subroutines" enabled the generator method generateCode(Root, ...) will automatically be called with all reachable subroutines of the root diagram in topological order. The generator attribute topLevel tells you whether you are just generating the code for the top-level diagram - i.e. the one you started the export for - or some subroutine. Make sure to set the inherited attribute subroutineInsertionLine with the value of code.count() at the appropriate position in either generateCode(Root...) or one of the methods generateHeader(...), generatePreamble(...) – usually the best choice –, or generateFooter(...) in topLevel mode. This is where the subroutine code will later be inserted. Also set the attribute subroutineIndent with the appropriate indentation depth for the subroutines. (All subroutines are indented in a series at the same level.)

I did not find getCommentSymbolRight mentioned in the howto – while it sounds very reasonable. Is this a concept only, did I use the wrong sources (git clone from today, using master) or what did I miss?

No idea. It is explained in 2.6, and it is also told that the method is not abstract, i.e. it comes with an empty string as default. Override it if you have to or leave it alone if COBOL allows line comments (like "//" in C++ or "#" in shell scripts) and you prefer to use them.

optionBasicLineNumbering should be renamed, I suggest optionSourceLineNumbering (I'll use it as optional setting – it would be commentary only but I need to override addCode() in any case and some people like the "old school number your cards" style that was historically relevant for COBOL - is it OK if I refactor this name?

Go ahead. You might use the BasicGenerator.addCode(...) as an example. We might get into trouble, though, with some other options like subroutine export, FileAPI insertion, includeInsertion because these things will be inserted later. Some of these troubles could be circumvented in BASIC (e.g. by not implementing the FileAPI :wink:) but I don't know whether this is an option for COBOL. You might also choose to add all the line numbers afterwards i.e. I could place a subclassable hook in the method Generator.exportCode() for this, immediately before the assembled code is actually saved to the file. (In good old BASIC 64 the line numbering served as GOTO and GOSUB labels, so the numbers had to be present during generation.)

What kind of reserved words should be returned by getReservedWords()?

I had already expected this question somehow after having read how many reserved words the dinosaur disposes of (what a mess!). You want a short answer? Here you are: The most important ones, some dozens perhaps – those a variable name might easily get into conflict with. Basicly all keywords used for the fundamental algorithm structures or important types, decalarators etc. These lists are used by the Analyser to warn users lest they should screw up their exports. More than 500 COBOL keywords would be likely to let nearly every single line of code being complained, which is of no use at all. And they would slow down Analyser significantly.

As it is important in COBOL where you put your includes to I'd not use Generator.insertUserInclude(String indent) but write and use COBOLGenerator.insertUserInclude(CodePart cp) with the following definition and a parsing of the GUI element like WS: data1.cpy, data2.cpy; LS: localdata.cpy; PD: helpers.cpy. Thoughts?

You are free to override whatever you want while you don't meddle with the base class mechanisms inherited by other Generator subclasses. You will have found out that you may simply ignore Generator.insertUserInclude(String _indent) since it isn't called by any method of the base Generator class itself, it's just an available helper that may be used or not. StrukTeX and BASIC generators don't make use of it, for example.

GitMensch commented 7 years ago

Thank you for taking the time to answer these first questions, this really helps.

I still don't fully understand the reserved words part but I think there is one thing that will really help: run the other generators and see what they actually do. I have not found any nice NSDs ready to be debugged and generated for the different languages including easy and full samples (full -> all NSD elements in) and a complex one with "nested" NSDs. Did I just missed them or should I create a new FR issue for adding these to a samples directory? I think it would be best to have one sample that can be generated into all languages, this would actually lead to a test option for new generators / when generator classes or the superclass are changed. Ideally the generated code can be used to test the parsers for the import feature, too...

For the code block: yes, it should have been CodePart in all places... (I've refactored it from CopyBookTarget and yes I've seen the code insertUserInclude is only a helper but I think it is a nice option (actually this is one of the minimal parts that already work ;-) - I thought to leave the name to a known one while using a different signature. The function is called in the different code parts with the appropriate CodePart value and it checks what includes are requested to be added where, skipping the rest. My main worry was "it changes the includes to be entered different by the user than for all other generators" but if this isn't a problem it seems like a nice solution (better performance could be get by parsing the includes when the generator is constructed but this should be done similar in all parsers).

So far I have the abstract classes in and figured out how to distinguish between free-form and fixed-form reference format and wrote the internal wrapper functions which work - but I think I'll remove these parts and postprocess the source as you've suggested. There is one thing that seems to be missing: the indentation is done directly in the String... class. Do you think it is possible to postpone this and store the current indentation value as number? I currently don't know much about the File API, if I understood it correctly it is only for executing a diagram - where would be a COBOL part relevant there? Should it generate the file-API calls as COBOL fileio statements? I'm sure I don't understand this yet...

After the first day with the packages: I tried to add the parser without copying anything but even with the nice howto this isn't possible if you never worked with Structorizer before. I'll likely will take the C# generator as primary source. Super helpful would be if the howto could be changed to a guide where you see some results before finishing everything. Something along:

Implement the abstract classes that way (it is in the howto already)
add function X and Y as minimal implementation
run your first version against "samples/1_verysimple_start_end_only.nsd"
if this works then add X1 to function X and add function Z, then run the second version against "samples/2_verysimple_minimal_output.nsd"

This would not only be more fun and would be easier to do but would also help to see step by step where problems actually are. In any case this would help people to create new generators.

codemanyak commented 7 years ago

If you want some test examples, here you are: fibonacci-1.zip InputPromptTest.zip Issue253_test8.zip Messwertstatistik5.zip QuickSort3.zip TestIssue335.zip testKGU348d.zip binSearchTreeDemo.zip

codemanyak commented 7 years ago

My main worry was "it changes the includes to be entered different by the user than for all other generators" but if this isn't a problem it seems like a nice solution (better performance could be get by parsing the includes when the generator is constructed but this should be done similar in all parsers).

Wrt the first worry: I think we may place a different tooltip help to the COBOL entry on the Includes tab. Your second sentence is somewhat obscure to me. Usually the includes will be parsed once on generating a source file. This isn't actually the time-critical part. The code generation as a whole is not time-critical in general.

codemanyak commented 7 years ago

@GitMensch

I still don't fully understand the reserved words part but I think there is one thing that will really help: run the other generators and see what they actually do.

Don't worry. The reserved words part is of no importance for the code generation. It's only relevant for the Analyser. It did make sense for me, though, to have the plugged-in generators provide this Analyser info as it is target language related. At the moment you could even return an empty array.

codemanyak commented 7 years ago

I'll likely take the C# generator as primary source.

Well, of course it's a good idea to make a copy of an existing generator and modify it towards your needs. Though I wouldn't advise to start with C#. It may look neat and simple but this is an illusion - it inherits most stuff from CGenerator, such that you have to have two superclasses in mind instead of just one. This isn't likely to ease understanding. I would recommend to start with a standalone generator, e.g. the PasGenerator. Pascal is structurally and syntactically very close to the Nassi-Shneiderman diagram in general and Structorizer in particular, which should further facilitate the first steps into the materia.

GitMensch commented 7 years ago

Thank you for the samples. Can we include them in-source?

codemanyak commented 7 years ago

@GitMensch What exactly do you mean with "include them in-source"?

GitMensch commented 7 years ago

What exactly do you mean with "include them in-source"?

I mean: create a directory "samples\NSD" and place the files in there. You may want to add "samples\Pascal", too (for adding files that are known to be importable without errors).

Having them in the git repository and in the release source tarball, maybe even in the jar allows people a simple first test, provides kind of a documentation and gives all the developers something to test their changes against.

codemanyak commented 7 years ago

I see. This would even allow to run automatic integration tests - when we'll eventually get there to set up a CI environment...

GitMensch commented 7 years ago

I think it is a longer way to get to CI but to have this as a goal is definitely a good step. Having the samples for manual tests is something that matches this goal. Be aware of possible license/copyright issues of the samples. If Structorizer does not need a copyright disclaimer / assignment then it is a simple "AUTHORS" file in the samples directory, even if Structorizer needs this (I didn't found information on this so far) then the sample directory could be an exception.

GitMensch commented 7 years ago

Remark: The AUTHORS file wouldn't be necessary if the NSD would get the author attribute, see #372

GitMensch commented 7 years ago

The COBOLGenerator I've started doesn't have much in, it is only the stub with the reserved words and handling for comments + user includes. As it isn't usable it doesn't make sense to have this targeted for the next release, likely not even to be included in the source. I vote for removing the milestone and add to "done if someone does it" list but keep this issue open - or close it and reopen later (depending how you handle issues like this).

As it is unlikely that I'll do anything for the export to COBOL part I just add the current stub here...

COBOLGenerator.java.txt

GitMensch commented 7 years ago

minimal spelling changes for howto attached howto.txt

codemanyak commented 7 years ago

Okay, I'd say we put the COBOL generator part to an new extra issue keeping it open (such that it's no longer part of the 3.27 release milestone) and solve just the general howto request in this issue.

GitMensch commented 7 years ago

... I leave the ticket part to you, but this one is already the "COBOL generator part" :-) Just needs the milestone removed.

codemanyak commented 5 years ago

@GitMensch In https://github.com/fesch/Structorizer.Desktop/issues/738#issuecomment-538882682 I wrote:

([...] I still feel challenged by the task to write a COBOL generator. The design and concept of this language go across the grain for me. Sorry having to say that.)

Well, sorry for that frustrated comment out of fear to open the Pandora's box named "COBOL" again. I may have a look into this project (which was postponed with your consent) again in the next months and try to advance a little in small doses. Of course it is an expedition into a weird and wild country for me, but I am no longer entirely blank of all knowledge there. The code preview will be of enormous help here since all the time-consuming steps to export, look into the result, compare with previous export etc. are no longer necessary. Well, and certain challenges have their tempting part... (though some clandestine generator projects like Kotlin and Golang look more interesting to me, actually).

GitMensch commented 5 years ago

The code preview will be of enormous help here since all the time-consuming steps to export, look into the result, compare with previous export etc. are no longer necessary.

I totally agree.

Well, sorry for that frustrated comment out of fear to open the Pandora's box named "COBOL" again. [...] Well, and certain challenges have their tempting part... (though some clandestine generator projects like Kotlin and Golang look more interesting to me, actually).

Don't feel any pressure on the COBOL code export. I'm out of time for being a big help (other than answering questions) on this topic myself and was not clear (and did not check) about the status of this issue when asking for it. Concerning Structorizer I'm primarily "watching" this project and if/when I find time to actually do more it will be an update to the COBOL parser. Before any import/export adjustments from/to any "languages" are made I highly suggest to get working jUnit-Tests that test all the stuff that is is available (I'm actually quite sure that on the way of adding the tests a manual inspection of the current results will likely show things to made better). Changing anything in the current state provides the risk of breaking stuff without being found until a bug report comes in (possibly 3 versions after the change was first published as release).

So in short: I suggest to not do anything on the COBOL export now, keeping this issue open and as very next step work on the jUnit-tests for export/import.

codemanyak commented 5 years ago

@GitMensch Fair enough.

fesch / Structorizer.Desktop

Add new language to export feature (COBOL) #357