Fix COPY REPLACING and REPLACE

lefessan commented 1 year ago

Current implementation does not conform to standard (COPY REPLACING and REPLACE are supposed to be in two successive passes instead of applied together), and has various bugs.

This version is probably less efficient, but better conforms to the standard.

Limitations:

This first version only modify the replacements during preprocessing, not the ones during the listing printing.
Since REPLACE are interpreted before the COPY REPLACING, they cannot be modified by them. A conforming implementation would interpret REPLACE strictly after COPY REPLACING.

GitMensch commented 1 year ago

Note: I guess WPI is "WIP", depending on if you see it ready for GC yourself you may request a review; if you see it as a draft in general, please set it as draft (on the right side on the Conversation page of the PR).

In any case: please keep the ci definitions as extra PR :-)

lefessan commented 1 year ago

Yes, it was WIP. I don't know why there is no "draft" button on my interface.

I am playing with the CI at this moment, but yet, I will definitively put the git related stuff in another PR.

lefessan commented 1 year ago

I used the CI to check that everything was running correctly. It passes all the tests and I removed the CI commit.

codecov-commenter commented 1 year ago

Codecov Report

Merging #75 (8bdfd9e) into gcos4gnucobol-3.x (a4abf11) will increase coverage by 0.17%. The diff coverage is 94.44%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x      #75      +/-   ##
=====================================================
+ Coverage              65.17%   65.35%   +0.17%     
=====================================================
  Files                     31       32       +1     
  Lines                  58390    58499     +109     
  Branches               15389    15394       +5     
=====================================================
+ Hits                   38058    38233     +175     
+ Misses                 14393    14327      -66     
  Partials                5939     5939

Impacted Files	Coverage Δ
cobc/pplex.l	`73.29% <85.18%> (+0.72%)`	:arrow_up:
cobc/replace.c	`95.26% <95.26%> (ø)`
cobc/cobc.c	`72.49% <100.00%> (+1.06%)`	:arrow_up:
cobc/ppparse.y	`61.33% <100.00%> (ø)`

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

GitMensch commented 1 year ago

Just a short update: the testcase is fine (I'll add that as expected failure with the next commit) and passes with other compilers.

It would likely be most useful to have this also fixed in the listing code. Maybe that's a similar fix or the new replace.c could be used from both places? To reach a "conforming" implementation: Would you consider adjusting pplex? Would this mean to iterate twice over the file, with skipping everything but COPY REPLACING to the second iteration?

Concerning the draft question: You should be able to setup it via the UI in the following place: gh-ui-draft

GitMensch commented 1 year ago

When checking the root issue it presented itself as follows:

ppecho_replace is called when any "replace" is active during ppecho
this matches as necessary
in the case of the given example, where the REPLACE code can only be applied after the COPY REPLACING code was handled it does not work because:
- main issue: a match is directly output by ppecho_direct; instead it has to "wait" until there is no outer (r->next) entry
- even if it would the outer replace would possibly - as in this case - see only a part of the word COLON (because the original :TEST: leads to an extra token), while it needs the resulting token
- as the content can be really long and can't be directly replaced anywhere this may in general only be solved by giving the replaced content back to the lexer - to do so a loop of unput, then leaving the functions (and the outer replace, so that it may be triggered in the next run) is to be done; but to catch the sample case here the unput would have to be done until last space (separators , and ; are already unput as space here) / line break

lefessan commented 1 year ago

I tried to understand how to use this work with the listings, but my main problem is that I don't understand what is expected as the result of listing. Indeed, COPY-REPLACING and REPLACE can completely change the source, including removing complete parts of the code, so the current line by line approach, or even displaying the content of a copybook after replacing, do not make sense.

If the goal is to have an equivalent source to the original one, but with COPY/REPLACE applied, and then well-formatted, then I would push to use the generated and format it back in a fixed format, while re-including comments from the original file when possible.

If the goal is different, then I don't really understand what it can be, and so how to reach it.

(note that I am not familiar with IBM/MF tooling, so maybe the goal is obvious for people used to them if they have a listing mode...)

GitMensch commented 1 year ago

The goal is to have a complete code as it is processed, in the "original style", with COPY and REPLACE applied and all comments, intermixed with diagnostics, applied "print formatting" (pages + headers) and possible suppression (+extra stuff like symbol listing, xref, diagnostic summary, ...).

The goal is to drop nearly all of the listing code in cobc.c, moving some parts (possibly to a new file, possibly to the existing ones) and to do everything during the "pre-processing" itself. Note that we do want to have that available there later also for display errors in their context (instead of "position only, add the context in the text").

... there's just not someone that took on the work to do so...

GitMensch commented 1 year ago

That PR is not in draft state, so (while I don't expect this to be available upstream soon) would you mind rebasing it to the current version?

GitMensch commented 1 year ago

As noted above: whenever this is tackled it should be split into two commits (those may both be in this single PR).

lefessan commented 1 year ago

I modified this PR to have 3 commits:

First one provides a replacement for the alt_space argument, that is cleaner (former version would only use the alt_space if there was no possible replacement, whereas in this version, it is passed around, making the code more uniform)
Second commit moves the code to replace.h/replace.c
Third commit implements a completely different algorithm for copy-replacing/replace using two phases instead of one (and using the same code for both phases)

GitMensch commented 1 year ago

Just to check: do you have an ETA?

lefessan commented 1 year ago

In this new version, I think I took most of your comments into account. Just two things:

In fact, after rereading bug 868, it looks like it is not actually a bug, as a single copy-replacing should not perform multipass (it is copy-replacing and replace that have to be in two different passes). So, actually, this PR is not fixing 868, but a part of bug 831 (the part that is not related to listing).
About the test with -P, I added it, together with tests with -t and -T, but I am not sure at all that the outputs are the ones expected. I need to read more doc about -P to understand what it should actually perform.

lefessan commented 1 year ago

I also added a new flag -fno-ttime, to suppress time information from listings (i.e. version and date fields). The purpose is to be able to easily compare listings, for example in the testsuite (instead of using UNIFY_LISTING).

lefessan commented 1 year ago

About the test with -P, I added it, together with tests with -t and -T, but I am not sure at all that the outputs are the ones expected. I need to read more doc about -P to understand what it should actually perform.

Looking at the code, I don't really understand what -P is expected to do, actually. The generated file does not currently contain the text of the file. All tokens are passed to check_listing(), but this function does not output the token, unless COB_INTERNAL_XREF is defined. Also, there is a variable requires_new_line that is never set in any part of the code.

lefessan commented 1 year ago

The current test with -P gives the following result:

AT_CHECK([$COMPILE_ONLY -P=preproc.lst prog.cob], [0], [], [])
AT_CHECK([cat preproc.lst], [0],
[     1      2       3       4       5       6       7       8      9      10     11      12          13          14          15  
],
[])

It does not pass on Macos/Windows, probably because of CRLF problems. I am removing it for now.

GitMensch commented 1 year ago

I also added a new flag -fno-ttime, to suppress time information from listings (i.e. version and date fields). The purpose is to be able to easily compare listings, for example in the testsuite (instead of using UNIFY_LISTING).

That is a reasonable approach and helps a lot to keep the testsuite clean, but that should be kept out of this PR; please move that to a separate PR and adjust the implementation:

-fno-ttimestamp (minor rename, but with getopt a user can still abbreviate that) will "suppress timestamp in listing headers"; note: this may provide more space for the title (not sure if we'd use it anywhere)

new option, which also fixes the TODO from set_standard_title():

CB_FLAG_RQ (cb_listing_with_title, 1, "ttitle",
_("  -fttitle=<title>      set listing title;\n"
  "                        defaults to package name and version"))

adjust the testsuite - only use UNIFY_LISTING in one place, in all the others set -fttitle="GnuCOBOL V.R.P" -fno-ttimestamp (the best option is possibly to add a new $COMPILE_LISTING to atlocal which uses $COMPILE_ONLY -fttitle="GnuCOBOL V.R.P" -fno-ttimestamp -tlines=0 an use that.

It may also contain the open points of FR #294 (where I've just added the spec above, so you can reference this):

in the listing header: show only the filename of the processed source, even when it is specified as path (reason: when a full path is specified the source file name is often truncated completely)
after the first listing header: show the complete name for the source (as specified on command line), with line continuation if necessary

GitMensch commented 1 year ago

The current test with -P [...] does not pass on Macos/Windows, probably because of CRLF problems. I am removing it for now.

Please don't remove that but comment the AT_CHECK out with a TODO note. Hint: https://github.com/OCamlPro/gnucobol/compare/669403e2f5cc561885be70d1ce1af9ebbdfb69f5..d627bdae090c590a0a851d96739cd34b291e5c91 shows that preproc2 was also dropped

GitMensch commented 1 year ago

Also, there is a variable requires_new_line that is never set in any part of the code.

This came in with the OC2.0 "commit" that added Roger's change (rev. 68); this already has this never set, so please drop all its uses (potentially with the listing commit, but that is fine with any commit)

GitMensch commented 1 year ago

Also, there is a variable requires_new_line that is never set in any part of the code.

This came in with the OC2.0 "commit" that added Roger's change (rev. 68); this already has this never set, so please drop all its uses (potentially with the listing commit, but that is fine with any commit)

Don't mind, I'll do that just now along with fixing the code that code was broken by @nberth's change in r4644 for "pplex.l (check_listing): do not output sequence number of short lines"

Done in r5101.

lefessan commented 1 year ago

Is there anything more to do on this PR ? The -P option seems to be working correctly now.

GitMensch commented 1 year ago

Apart from the missing Changelog entry for -P (this may even be NEWS worthy) and a NEWS entry about "cobc now uses a two-pass algorithm..." that is ready for SVN.

lefessan commented 1 year ago

Merged, thanks !

GitMensch commented 1 year ago

Would you mind fixing the vs builds? The new file has to be added to all files under build_windows that reference help.c.

GitMensch commented 1 year ago

Does this also fix https://sourceforge.net/p/gnucobol/bugs/23? If yes then it is likely good to check and possibly reference it. That was a completely separate part, now fixed upstream.

Only open: MSVC (and OrangeC) project definitions for cobc (new replace.h).

GitMensch commented 1 year ago

Would you mind fixing the vs builds? The new file has to be added to all files under build_windows that reference help.c.

Done upstream.

GitMensch commented 1 year ago

Primarily a note: it seems that this PR has increased memory usage of cobc during preparing a lot (testing with sources that have ~500000 LOC and no REPLACE or REPLACING at all).

The following is about cobc -E some.cob -e some.i:

Flamegraph for the overview

hotspots

The main cpu instructions and memory are to be seen in replace.c (which is logically, we also moved old memory allocations there).

Biggest hotspot: add_text_to_replace

Question @lefessan: Could you recheck for possible optimizations, especially reducing memory when no REPLACE is active?

Note: cpu-time wise that's no problem, with this big source there is a cpu time of 1.2 seconds for the preparsing, which is ok. But memory-wise - I'm not sure.

GitMensch commented 8 months ago

Friendly ping @lefessan on the memory usage/performance, especially when there's no REPLACE active in the whole file.

GitMensch commented 7 months ago

@lefessan you may have missed that: this PR has increased memory usage of cobc during preparing a lot; can you please reinspectif you find a better way to handle that?

GitMensch commented 4 months ago

Here some numbers from a huge (generated) COBOL file, from valgrinds recorded heap usage

total heap usage	allocs	frees	bytes allocated
GC 3.1.2	5,526	5,526	20,443,319
GC 3.2 (after this PR)	21,677,268	21,677,268	893,378,306

Note that also this file has no REPLACE or REPLACING at all.

Of course, this makes anything that inspects the memory (for example valgrind or cobc if built with hardening options) take much longer.

OCamlPro / gnucobol

Fix COPY REPLACING and REPLACE #75

Codecov Report