Closed ddeclerck closed 2 months ago
Merging #77 (41e67d8) into gcos4gnucobol-3.x (516b813) will decrease coverage by
0.20%
. The diff coverage is15.04%
.
@@ Coverage Diff @@
## gcos4gnucobol-3.x #77 +/- ##
=====================================================
- Coverage 65.27% 65.07% -0.20%
=====================================================
Files 31 31
Lines 56425 56623 +198
Branches 14723 14779 +56
=====================================================
+ Hits 36830 36846 +16
- Misses 13807 13973 +166
- Partials 5788 5804 +16
Impacted Files | Coverage Δ | |
---|---|---|
cobc/cobc.c | 70.65% <0.00%> (-0.36%) |
:arrow_down: |
cobc/parser.y | 68.89% <ø> (-0.08%) |
:arrow_down: |
libcob/cconv.c | 37.50% <0.00%> (-38.98%) |
:arrow_down: |
cobc/codegen.c | 73.81% <18.79%> (-1.00%) |
:arrow_down: |
cobc/flag.def | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
The question is: Can any of the above be at least improved without adding the iconv dependency?
Well, if we restrict to EBCDIC <-> 8-bit ASCII translations (mostly ISO-8859-X) (and possibly UTF-X / UCS-X), it guess it would be acceptable to just copy the tables we need from iconv (the lib itself is LGPL, as GnuCobol runtime). Otherwise, if we'd like to support all kind of encodings, then we would end up just reimplementing iconv :p.
Anyways, the problem right now is that our client only needs these tables to get the SORT working as on his mainframe (I added support for CODE-SET and CONVERTING as a "bonus"), so we can hardly spend extra time implementing all the scenarios (if we had unlimited funding, I'd be happy to investigate all of these). Maybe we can think of a way to implement this quickly without impairing our ability to implement the remaining scenarios ?
Then what about only adding the necessary tables hard-wired for now? This could get into 3.2 final if it is quick and clean, too, and provide others with a hint how to add their own translations.
Then what about only adding the necessary tables hard-wired for now? This could get into 3.2 final if it is quick and clean, too, and provide others with a hint how to add their own translations.
So you mean, adding another translation table use the current (-febcdic-table) option ?
Yes, that's what I can offer for 3.2 inclusion. Please add a comment on how you created that table.
... thinking again: even better would be to allow -febcdic-table to read in a file (checking just the length would be fine) and store it in the module.
Yes, that's what I can offer for 3.2 inclusion. Please add a comment on how you created that table. ... thinking again: even better would be to allow -febcdic-table to read in a file (checking just the length would be fine) and store it in the module.
That would probably solve the problem in the easiest possible way. I'll do that in another PR.
Given -fdefault-colseq
and #78 - where does this lead us concerning the issues above?
Well, #78 and -fdefault-colseq
do solve our current issue, yet the points raised here remain valid. Let's keep that here as a draft for reference for now.
@ddeclerck Can you please rebase this and then review what's missing feature-wise so far?
Ideally you could add / adjust some testcases, allowing everyone to grasp more easy what the new options are to be used for.
Hi @GitMensch,
Actually this PR is rather obsolete ; the new option -fdefault-colseq
(#72 #101) and the modifications to allow using custom translation tables in -febcdic-table
(#78) are as useful and simpler to use (and easier to implement).
Its your PR, so if all things from the PR's initial post are covered otherwise, then feel free to close this (maybe adding a new label obsolete or similar, so its easier to distinguish from the closed "merged upstream" ones).
One thing that I've seen which was the reason to check the PRs: I'm not sure if -fdefault-colseq
is applied correctly for SPECIAL-NAMES. ALPHABET nat IS NATIVE.
- it would be helpful if you could check if that's the case and if we have a testcase for that.
One thing that I've seen which was the reason to check the PRs: I'm not sure if
-fdefault-colseq
is applied correctly forSPECIAL-NAMES. ALPHABET nat IS NATIVE.
- it would be helpful if you could check if that's the case and if we have a testcase for that.
To my understanding, -fdefault-colseq
is like an implicit PROGRAM COLLATING SEQUENCE
clause ; so should it really affect the NATIVE
alphabet definition ?
Hm, MF has a NATIVE directive which changes the default collating sequence, but that doesn't change the default alphabet either. To change that you'd use the CHARSET directive (wich also changes DEFAULTBYTE, SIGN, NATIVE),
I thionk we don't have an option to effectively change the native alphabet, do we?
Hm, MF has a NATIVE directive which changes the default collating sequence, but that doesn't change the default alphabet either. To change that you'd use the CHARSET directive (wich also changes DEFAULTBYTE, SIGN, NATIVE),
I thionk we don't have an option to effectively change the native alphabet, do we?
Not really, though that might have some use (we talked to someone who claimed he needed the internal, in-memory data representation to be in EBCDIC while running on an ASCII machine).
This PR adds some support for character encodings in different places, using iconv to build conversion tables.
Note to @GitMensch : I still have to add some tests, but you can start reviewing.
It introduces the following new options : -fnative-charset -ftarget-charset -fcodeset-ascii -fcodeset-ebcdic -fconvert-ascii -fconvert-ebcdic
The -fnative-charset and -ftarget-charset allow to specify the encodings used in the execution environment (native) and in the environment we are trying to simulate (target). When they are both set, the collating tables (for the purpose of sorting only) are built by combining these two encodings and the "vendor" EBCDIC/ASCII table.
Our typical use case is the following. We simulate a GCOS environment with EBCDIC-500 encoding (our target). We run on a standard Linux PC, where we decided that all the data we process would use the ISO-8859-1 encoding (we call this "native"). We have a file with some records we'd like to sort according to the EBCDIC collating sequence (the default one on this platform), so that our program sorts in the exact same order it would if it were run on our GCOS environment. If we use the "vendor" EBCDIC/ASCII table (the one from the GCOS documentation), this will fail: this table does not define an EBCDIC-500 <-> ISO-8859-1 translation (in particular, extended characters are just given consecutive values).
That's where the -fnative-charset and -ftarget-charset are useful. Here, we set -fnative-charset to ISO-8859-1 and -ftarget-charset to IBM500 (iconv's name for EBCDIC 500). As they are both set, we will call iconv to produce an ISO-8859-1 to EBCDIC-500 translation table (cob_colseq_native_ebcdic) and give that one instead of the vendor table to the sort function.
Now, what if the SORT instruction contained a COLLATING SEQUENCE IS ASCII clause ? Here, we would want to sort as our GCOS system would, hence using the vendor table. But iconv does not know about the specific ASCII encoding used by GCOS. What we want to do is thus to convert from ISO-8859-1 to EBCDIC-500 and then use the vendor table to translate that EBCDIC to the GCOS-specific ASCII. To avoid performing two conversions, we simply build a new conversion table (cob_colseq_native_ascii) that combines the two steps.
The implementation has been made generic enough so that it works on any combination of native/target encodings among the EBCDIC and ASCII encodings ; in particular, it will use the vendor table in one direction or the other depending on the actual target encoding (i.e. if the target encoding is EBCDIC, it will add a vendor EBCDIC -> ASCII translation when requesting an ASCII collating sequence, and symmetrically, if the target encoding is ASCII, it will add a vendor ASCII -> EBCDIC translation when requesting an EBCDIC collating sequence).
That's for the main stuff.
Now, since I had started digging about encodings, I thought it could also be useful to be able to specify the ASCII and EBCDIC encodings used in the CODE-SET and CONVERTING/REPLACING clauses. That's the purpose of the -fcodeset-ascii, -fcodeset-ebcdic, -fconvert-ascii and -fconvert-ebcdic options.
The -fcodeset-ascii and -fcodeset-ebcdic work together with the -fnative-charset option. When specified, and when a CODE-SET clause is active on a file, the necessary conversions will be performed by iconv using the specified encodings instead of instead of the vendor table. For instance, reading from a file that has CODE-SET IS EBCDIC will perform a translation from the encoding specified by the -fcodeset-ebcdic option to the encoding specified by the -fnative-charset option. Could be convenient if migrating an application from a mainframe while keeping data files in an EBCDIC encoding.
As for the -fconvert-ascii and -fconvert-ebcdic options, they allow to specify the conversions to perform in INSPECT CONVERTING / REPLACING and TRANSFORM CONVERTING.