OCamlPro / gnucobol

A clone of the sourceforge GnuCOBOL compiler from COBOL to C.
https://get-superbol.com
GNU Lesser General Public License v3.0
16 stars 21 forks source link

Fix COPY REPLACING and REPLACE #75

Closed lefessan closed 1 year ago

lefessan commented 1 year ago

Current implementation does not conform to standard (COPY REPLACING and REPLACE are supposed to be in two successive passes instead of applied together), and has various bugs.

This version is probably less efficient, but better conforms to the standard.

Limitations:

GitMensch commented 1 year ago

Note: I guess WPI is "WIP", depending on if you see it ready for GC yourself you may request a review; if you see it as a draft in general, please set it as draft (on the right side on the Conversation page of the PR).

In any case: please keep the ci definitions as extra PR :-)

lefessan commented 1 year ago

Yes, it was WIP. I don't know why there is no "draft" button on my interface.

I am playing with the CI at this moment, but yet, I will definitively put the git related stuff in another PR.

lefessan commented 1 year ago

I used the CI to check that everything was running correctly. It passes all the tests and I removed the CI commit.

codecov-commenter commented 1 year ago

Codecov Report

Merging #75 (8bdfd9e) into gcos4gnucobol-3.x (a4abf11) will increase coverage by 0.17%. The diff coverage is 94.44%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x      #75      +/-   ##
=====================================================
+ Coverage              65.17%   65.35%   +0.17%     
=====================================================
  Files                     31       32       +1     
  Lines                  58390    58499     +109     
  Branches               15389    15394       +5     
=====================================================
+ Hits                   38058    38233     +175     
+ Misses                 14393    14327      -66     
  Partials                5939     5939              
Impacted Files Coverage Δ
cobc/pplex.l 73.29% <85.18%> (+0.72%) :arrow_up:
cobc/replace.c 95.26% <95.26%> (ø)
cobc/cobc.c 72.49% <100.00%> (+1.06%) :arrow_up:
cobc/ppparse.y 61.33% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

GitMensch commented 1 year ago

Just a short update: the testcase is fine (I'll add that as expected failure with the next commit) and passes with other compilers.

It would likely be most useful to have this also fixed in the listing code. Maybe that's a similar fix or the new replace.c could be used from both places? To reach a "conforming" implementation: Would you consider adjusting pplex? Would this mean to iterate twice over the file, with skipping everything but COPY REPLACING to the second iteration?

Concerning the draft question: You should be able to setup it via the UI in the following place: gh-ui-draft

GitMensch commented 1 year ago

When checking the root issue it presented itself as follows:

lefessan commented 1 year ago

I tried to understand how to use this work with the listings, but my main problem is that I don't understand what is expected as the result of listing. Indeed, COPY-REPLACING and REPLACE can completely change the source, including removing complete parts of the code, so the current line by line approach, or even displaying the content of a copybook after replacing, do not make sense.

If the goal is to have an equivalent source to the original one, but with COPY/REPLACE applied, and then well-formatted, then I would push to use the generated and format it back in a fixed format, while re-including comments from the original file when possible.

If the goal is different, then I don't really understand what it can be, and so how to reach it.

(note that I am not familiar with IBM/MF tooling, so maybe the goal is obvious for people used to them if they have a listing mode...)

GitMensch commented 1 year ago

The goal is to have a complete code as it is processed, in the "original style", with COPY and REPLACE applied and all comments, intermixed with diagnostics, applied "print formatting" (pages + headers) and possible suppression (+extra stuff like symbol listing, xref, diagnostic summary, ...).

The goal is to drop nearly all of the listing code in cobc.c, moving some parts (possibly to a new file, possibly to the existing ones) and to do everything during the "pre-processing" itself. Note that we do want to have that available there later also for display errors in their context (instead of "position only, add the context in the text").

... there's just not someone that took on the work to do so...

GitMensch commented 1 year ago

That PR is not in draft state, so (while I don't expect this to be available upstream soon) would you mind rebasing it to the current version?

GitMensch commented 1 year ago

As noted above: whenever this is tackled it should be split into two commits (those may both be in this single PR).

lefessan commented 1 year ago

I modified this PR to have 3 commits:

GitMensch commented 1 year ago

Just to check: do you have an ETA?

lefessan commented 1 year ago

In this new version, I think I took most of your comments into account. Just two things:

lefessan commented 1 year ago

I also added a new flag -fno-ttime, to suppress time information from listings (i.e. version and date fields). The purpose is to be able to easily compare listings, for example in the testsuite (instead of using UNIFY_LISTING).

lefessan commented 1 year ago
  • About the test with -P, I added it, together with tests with -t and -T, but I am not sure at all that the outputs are the ones expected. I need to read more doc about -P to understand what it should actually perform.

Looking at the code, I don't really understand what -P is expected to do, actually. The generated file does not currently contain the text of the file. All tokens are passed to check_listing(), but this function does not output the token, unless COB_INTERNAL_XREF is defined. Also, there is a variable requires_new_line that is never set in any part of the code.

lefessan commented 1 year ago

The current test with -P gives the following result:

AT_CHECK([$COMPILE_ONLY -P=preproc.lst prog.cob], [0], [], [])
AT_CHECK([cat preproc.lst], [0],
[     1      2       3       4       5       6       7       8      9      10     11      12          13          14          15  
],
[])

It does not pass on Macos/Windows, probably because of CRLF problems. I am removing it for now.

GitMensch commented 1 year ago

I also added a new flag -fno-ttime, to suppress time information from listings (i.e. version and date fields). The purpose is to be able to easily compare listings, for example in the testsuite (instead of using UNIFY_LISTING).

That is a reasonable approach and helps a lot to keep the testsuite clean, but that should be kept out of this PR; please move that to a separate PR and adjust the implementation:

It may also contain the open points of FR #294 (where I've just added the spec above, so you can reference this):

GitMensch commented 1 year ago

The current test with -P [...] does not pass on Macos/Windows, probably because of CRLF problems. I am removing it for now.

Please don't remove that but comment the AT_CHECK out with a TODO note. Hint: https://github.com/OCamlPro/gnucobol/compare/669403e2f5cc561885be70d1ce1af9ebbdfb69f5..d627bdae090c590a0a851d96739cd34b291e5c91 shows that preproc2 was also dropped

GitMensch commented 1 year ago

Also, there is a variable requires_new_line that is never set in any part of the code.

This came in with the OC2.0 "commit" that added Roger's change (rev. 68); this already has this never set, so please drop all its uses (potentially with the listing commit, but that is fine with any commit)

GitMensch commented 1 year ago

Also, there is a variable requires_new_line that is never set in any part of the code.

This came in with the OC2.0 "commit" that added Roger's change (rev. 68); this already has this never set, so please drop all its uses (potentially with the listing commit, but that is fine with any commit)

Don't mind, I'll do that just now along with fixing the code that code was broken by @nberth's change in r4644 for "pplex.l (check_listing): do not output sequence number of short lines"

Done in r5101.

lefessan commented 1 year ago

Is there anything more to do on this PR ? The -P option seems to be working correctly now.

GitMensch commented 1 year ago

Apart from the missing Changelog entry for -P (this may even be NEWS worthy) and a NEWS entry about "cobc now uses a two-pass algorithm..." that is ready for SVN.

lefessan commented 1 year ago

Merged, thanks !

GitMensch commented 1 year ago

Would you mind fixing the vs builds? The new file has to be added to all files under build_windows that reference help.c.

GitMensch commented 1 year ago

Does this also fix https://sourceforge.net/p/gnucobol/bugs/23? If yes then it is likely good to check and possibly reference it. That was a completely separate part, now fixed upstream.

Only open: MSVC (and OrangeC) project definitions for cobc (new replace.h).

GitMensch commented 1 year ago

Would you mind fixing the vs builds? The new file has to be added to all files under build_windows that reference help.c.

Done upstream.

GitMensch commented 1 year ago

Primarily a note: it seems that this PR has increased memory usage of cobc during preparing a lot (testing with sources that have ~500000 LOC and no REPLACE or REPLACING at all).

The following is about cobc -E some.cob -e some.i:

Flamegraph for the overview flamegraph

hotspots hotspots

The main cpu instructions and memory are to be seen in replace.c (which is logically, we also moved old memory allocations there).

Biggest hotspot: add_text_to_replace

Question @lefessan: Could you recheck for possible optimizations, especially reducing memory when no REPLACE is active?

Note: cpu-time wise that's no problem, with this big source there is a cpu time of 1.2 seconds for the preparsing, which is ok. But memory-wise - I'm not sure.

GitMensch commented 8 months ago

Friendly ping @lefessan on the memory usage/performance, especially when there's no REPLACE active in the whole file.

GitMensch commented 7 months ago

@lefessan you may have missed that: this PR has increased memory usage of cobc during preparing a lot; can you please reinspectif you find a better way to handle that?

GitMensch commented 4 months ago

Here some numbers from a huge (generated) COBOL file, from valgrinds recorded heap usage

total heap usage allocs frees bytes allocated
GC 3.1.2 5,526 5,526 20,443,319
GC 3.2 (after this PR) 21,677,268 21,677,268 893,378,306

Note that also this file has no REPLACE or REPLACING at all.

Of course, this makes anything that inspects the memory (for example valgrind or cobc if built with hardening options) take much longer.