Closed nberth closed 1 year ago
The main issue is that this isn't COBOL free-form reference-format but XOPEN "free form reference-format".
The X/Open COBOL definition for free form reference format is modelled after the CODASYL COBOL reference format for free form source. [...]
- When appearing in the first character position of a line, the single-byte symbols ‘‘∗’’ and ‘‘/’’ are always interpreted as indicating comment lines.
- When appearing in the first two character positions of a line, the single-byte symbol pair ‘‘D ’’ (D followed by space) is always interpreted as a debugging line.
-fmf-comment
adjusts fixed-form only (an alternative would be a CB_FORMAT_MF_FIXED
which implies this).
CB_FORMAT_XOPEN_FFF
does handle "*" and "/" and "D " correctly without additional flags, doesn't it? If not then this should definitely be fixed.
The space between
$
andSET
in the directive is neither documented not supported by GnuCOBOL. I'm not sure whether this is actually allowed.
I've rechecked that: Yes, this is allowed with MF, there should be a bug report on this in GnuCOBOL and if/when possible a fix.
-fmf-comment
adjusts fixed-form only (an alternative would be aCB_FORMAT_MF_FIXED
which implies this).CB_FORMAT_XOPEN_FFF
does handle "*" and "/" and "D " correctly without additional flags, doesn't it? If not then this should definitely be fixed.
Ok I see. The files I've seen always start with the SOURCEFORMAT"FREE"
directive, so resorting to X/Open, or adding other formats with a different name, might not solve the issue completely. The alternative is to change the interpretation of FREE
using as specific flag… not sure this is very clean though. In addition, as the suggested change did not break any test, that might indicate the current tests for COBOL free-form reference-format might not be incomplete (e.g., a *
at the beginning of a line that need to be interpreted as an arithmetic operation is probably not tested).
The space-after-$
issue seems rather easy to fix at first glance; I'll see to prepare a new PR for this. Does this work on MF for any directive or just SET
?
The space-after-
$
issue seems rather easy to fix at first glance; I'll see to prepare a new PR for this. Does this work on MF for any directive or justSET
?
Nice, works on all I've tested, which includes conditional compilation and SET - just go with all.
Ok I see. The files I've seen always start with the
SOURCEFORMAT"FREE"
directive, so resorting to X/Open, or adding other formats with a different name, might not solve the issue completely.
Good point, that means that reading the SOURCEFORMAT directive with -std=mf
(check the dialect flag) should set the CB_FORMAT_XOPEN_FFF
, similar - if a fixed format for MF is added, which seems reasonable - this should also be selected automatically for $ SET SOURCEFORMAT FREE
(actually set or implied).
... So that means that -fixed
/ -free
should also be dialect aware.
Good point, that means that reading the SOURCEFORMAT directive with
-std=mf
(check the dialect flag) should set theCB_FORMAT_XOPEN_FFF
, similar - if a fixed format for MF is added, which seems reasonable - this should also be selected automatically for$ SET SOURCEFORMAT FREE
(actually set or implied). ... So that means that-fixed
/-free
should also be dialect aware.
Poking about the docs I think we need another format to handle this, like a CB_FORMAT_MF_FREE
, that is very similar to X/Open free-form format but allows lines of 256 single-byte characters (instead of 80 "character positions" for X/Open FFF). As per the MF doc on "FREE" format: "There is no fixed right margin, though for practical purposes this implementation restricts the maximum length of a source line to 256 bytes of source when represented in UTF8 encoding."
Then, to handle $ SET SOURCEFORMAT"FREE"
properly, maybe it's a matter of defining dialect-specific dictionaries of source format names, with a default and an MF-specific one that remaps "FREE". If this dictionary can then be selected using a dialect option like source-format-dict=[mf|default]
, we'd need a way to ensure that -fformat=free -fsource-format-dict=mf
and -fsource-format-dict=mf -fformat=free
behave the same. I'm not sure whether this is easy to do with the current way configuration options are handled…
A more naive but simpler approach would be to treat $ SET SOURCEFORMAT"FREE"
, which is an MF extension, differently than the >>SOURCE
equivalent (with possibly for the MF one a configurable note or warning about the name mismatch).
Nice, works on all I've tested, which includes conditional compilation and SET - just go with all.
One last detail: is any number of space allowed, or a single one?
Nice, works on all I've tested, which includes conditional compilation and SET - just go with all.
One last detail: is any number of space allowed, or a single one?
Rechecked: both "any number of space" and/or any number of tabs.
Poking about the docs I think we need another format to handle this, like a
CB_FORMAT_MF_FREE
, that is very similar to X/Open free-form format but allows lines of 256 single-byte characters (instead of 80 "character positions" for X/Open FFF). As per the MF doc on "FREE" format: "There is no fixed right margin, though for practical purposes this implementation restricts the maximum length of a source line to 256 bytes of source when represented in UTF8 encoding."
As we don't explicit handle "character positions" we should go with "possible bytes per character" - and this would mean 1-4 for UTF8 so I'd say 256 "practical limit" is fine.
to handle
$ SET SOURCEFORMAT"FREE"
properly, maybe it's a matter of defining dialect-specific dictionaries of source format names
Nah, just check for cb_std_def
(CB_STD_MF
) when parsing that, and possibly add to cb_std_def
as needed.
As we don't explicit handle "character positions" we should go with "possible bytes per character" - and this would mean 1-4 for UTF8 so I'd say 256 "practical limit" is fine.
The current implementation restricts X/Open free-form format to 80 columns. So I was probably wrong when setting that limit, and should have selected 320 instead as this is a count of bytes and not characters.
That's leaving us with the only minor difference of MF explicitly allowing "d " and "D " as debug indicators, and X/Open only explicitly allowing "D " (but "d " might just be not documented).
Current changes also start to remove some "free-from" formats from those eligible for Area A enforcement: for some reason I included some terminal-like formats in this set originally. I'm mostly sure about X/Open free-format that mentions it has no Area A. I'll need to check the others (CRT, TERMINAL, and XCARD).
... and I've just recognized that we don't handle MFCOMMENT correctly either - if active it does not only set that in fixed format an asterisk at column 1 means "comment line" - it actually means "comment line with listing suppression".
... not sure if you get to that (all in cobc/cobc.c) part, but wanted to drop it as "related". Could be done here or with a follow-up PR.
Thank you. Ready for svn if you don't want to add $SET DIALECT before :-)
Merging #68 (9b6bbf1) into gcos4gnucobol-3.x (173790f) will decrease coverage by
0.00%
. The diff coverage is86.66%
.
@@ Coverage Diff @@
## gcos4gnucobol-3.x #68 +/- ##
=====================================================
- Coverage 65.08% 65.08% -0.01%
=====================================================
Files 34 34
Lines 55907 55910 +3
Branches 14678 14683 +5
=====================================================
+ Hits 36386 36387 +1
Misses 13661 13661
- Partials 5860 5862 +2
Impacted Files | Coverage Δ | |
---|---|---|
cobc/flag.def | 100.00% <ø> (ø) |
|
cobc/cobc.c | 70.11% <85.71%> (+<0.01%) |
:arrow_up: |
cobc/pplex.l | 71.33% <87.50%> (-0.10%) |
:arrow_down: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Thank you. Ready for svn if you don't want to add $SET DIALECT before :-)
Just at attempt at better honoring MFCOMMENT w.r.t listings. Please check the tests, as I'm not sure this is indeed the expected behavior.
I think I'll leave auto-enabling mf/acucomment
for later. Not fan of putting an ad-hoc piece of code in cb_load_std
, although it's tempting as it is indeed the simplest approach…
Now upstream :tada:
I've recently come across some COBOL source files apparently targeting MicroFocus, where comments could be seen as lines starting with a
*
after a$ SET SOURCEFORMAT"FREE"
directive. This appears to be backed up by https://www.microfocus.com/documentation/visual-cobol/vc50pu7/VS2019/HRLHLHINTR01U008.html . This PR is a suggestion to handle*
|/
-starting lines as comments when the-fmfcomment
is given, even in FREE reference source format.The space between
$
andSET
in the directive is neither documented not supported by GnuCOBOL. I'm not sure whether this is actually allowed.