ZOSOpenTools / libiconvport

Apache License 2.0
0 stars 1 forks source link

Validate EBCDIC Latin-1 code pages (at a minimum) for accuracy #4

Closed MikeFultonDev closed 1 year ago

MikeFultonDev commented 1 year ago

1047 had (at least) one bug in it where 0x0a -> 0x25 instead of 0x15 (this is already patched locally)

MikeFultonDev commented 1 year ago

I created a test (compconv.sh) for this. The first problem hit is IBM-1047 to ISO8859-1:

cmp -l /tmp/zot.83891463.IBM-1047.txt /tmp/ibm.83891463.IBM-1047.txt
 21 205  12
 37  12 205

So code point 21 and code point 37 are being converted opposite from each other for the 2 'iconv' implementations.

MikeFultonDev commented 1 year ago

Note cmp prints out values 'in octal', but the offset is in decimal, so the first line says:

byte number 21 (in decimal) has octal value 205 in the 'zot' file and has octal value 12 in the IBM file

MikeFultonDev commented 1 year ago

I added a test for going the 'other' direction (ISO8859-1 to X). First problem hit for IBM-1047 is:

cmp -l /tmp/zot.67112607.IBM-1047.to.txt /tmp/ibm.67112607.IBM-1047.to.txt
133  25  45
MikeFultonDev commented 1 year ago

Note the linefeed and newline is also an issue (already patched but perhaps needs something more. See Java dynamic option https://www.ibm.com/support/pages/apar/IV21473 (thanks @IgorTodorovskiIBM )

MikeFultonDev commented 1 year ago

From @ccw

image

MikeFultonDev commented 1 year ago

I switched to build from the dev line so that I could pick up a fix from Bruno Haible. We can now round-trip from ASCII to EBCDIC to ASCII and if we compare the z/OS Open Tools EBCDIC iconv translation to the IBM iconv translation, they are the same, BUT you have to specify the ICONV_EBCDIC_ZOS_UNIX environment variable to something (I chose 1). ICONV_EBCDIC_ZOS_UNIX is set in the .env for libiconv now so you will 'get it for free' This change is made via PR #11 so this is now fixed (there is also a roundtrip.sh script in the 'tests' directory to verify this conversion is being done correctly.

Note there is a bug when using iconv command under bash converting from non-ISO8859-1 to another code page (see #12).