TypeCobolTeam / TypeCobol

TypeCobol is an Incremental Cobol parser for IBM Enterprise Cobol 6 for zOS syntax. TypeCobol is also an extension of Cobol 85 language which can then be converted to Cobol85.
Other
78 stars 26 forks source link

Incorrect encoding for alphanumeric literals using hexadecimal notation #2632

Open fm-117 opened 4 months ago

fm-117 commented 4 months ago

What is the problem ?

The scanner uses the MulitlineScanState.EncodingForAlphanumericLiterals property to get the string value of alphanumeric literals described using the hexadecimal notation. However this property gets its value from the encoding of the source file which is a different notion.

Here are the IBM specs for alphanumeric literals written in hex:

Hexadecimal digits are characters in the range '0' to '9', 'a' to 'f', and 'A' to 'F', inclusive. Two hexadecimal digits represent one character in a single-byte character set (EBCDIC or ASCII). Four hexadecimal digits represent one character in a DBCS character set. A string of EBCDIC DBCS characters represented in hexadecimal notation must be preceded by the hexadecimal representation of a shift-out control character (X'0E') and followed by the hexadecimal representation of a shift-in control character (X'0F'). An even number of hexadecimal digits must be specified. The maximum length of a hexadecimal literal is 320 hexadecimal digits.

The continuation rules are the same as those for any alphanumeric literal. The opening delimiter (X" or X') cannot be split across lines.

The DBCS compiler option has no effect on the processing of hexadecimal notation of alphanumeric literals.

How to fix ?

fm-117 commented 4 months ago

See DISPLAY.CodeElements.txt for an example of wrong text value: https://github.com/TypeCobolTeam/TypeCobol/blob/f568ebe67766c1860646407367c492a3a886b827/TypeCobol.Test/Parser/CodeElements/DISPLAY.CodeElements.txt#L96-L97

fm-117 commented 4 months ago

As for now:

fm-117 commented 4 months ago

Partially fixed by #2633.

We still need: