TypeCobolTeam / TypeCobol

TypeCobol is an Incremental Cobol parser for IBM Enterprise Cobol 6 for zOS syntax. TypeCobol is also an extension of Cobol 85 language which can then be converted to Cobol85.
Other
78 stars 26 forks source link

Check allowed characters in partial cobol word #2471

Closed smedilol closed 1 year ago

smedilol commented 1 year ago

Describe the bug

IBM Compiler check characters allowed in replace clause. See "Chapter 1. Characters" page 3 of IBM PDF "Enterprise COBOL for z/OS 6.3 Language Reference".

Allowed characters are :

(space)
+
-
*
/
=
$
,
;
.
'
"
(
)
<
>
:
_
A-Z
a-z
0-9

So for example we must forbid character @ in replace like IBM compiler.

Out of scope

This issue is only about Partial Cobol word because a few programs in our company are not parsed correctly. Allowed characters in replace directive out of partial cobol word will not be checked in this issue.

To Reproduce

CheckPseudoText1.rdz.cbl:

       IDENTIFICATION DIVISION.
       PROGRAM-ID.    CheckPseudoText1.
       DATA DIVISION.
       working-storage section.

       01  Group1.
       replace ==:_:== by ==FOO==.
           05 :_:-Var pic X.
       replace ==:>:== by ==FOO==.
           05 :>:-Var pic X.
       replace ==:=:== by ==FOO==.
           05 :=:-Var pic X.
       replace ==:<:== by ==FOO==.
           05 :<:-Var pic X.
       replace ==:;:== by ==FOO==.
           05 :;:-Var pic X.
       replace ==:/:== by ==FOO==.
           05 :/:-Var pic X.
       replace ==:.:== by ==FOO==.
           05 :.:-Var pic X.
       replace ==:-:== by ==FOO==.
           05 :-:-Var pic X.
       replace ==:,:== by ==FOO==.
           05 :,:-Var pic X.
       replace ==:+:== by ==FOO==.
           05 :+:-Var pic X.
       replace ==:*:== by ==FOO==.
           05 :*:-Var pic X.
       replace ==:):== by ==FOO==.
           05 :):-Var pic X.
       replace ==:(:== by ==FOO==.
           05 :(:-Var pic X.
       replace ==:$:== by ==FOO==.
           05 :$:-Var pic X.
       replace ==: :== by ==FOO==.
           05 : :-Var pic X.

      *Ok
       replace ==:'':== by ==FOO==.
           05 :'':-Var pic X.
      *Ok
       replace ==:"":== by ==FOO==.
           05 :"":-Var pic X.

      *NOT OK because string is not terminated correctly ?
      *An invalid"REPLACE"statement was found.  Scanning was resumed at
      *  the period terminating the"REPLACE"statement.
       replace ==:':== by ==FOO==.
           05 :':-Var pic X.
      *NOT OK because string is not terminated correctly ?
      *An invalid"REPLACE"statement was found.  Scanning was resumed at
      *  the period terminating the"REPLACE"statement.
       replace ==:":== by ==FOO==.
           05 :":-Var pic X.

      *Non-COBOL character"@"was found in column 19.
      *  The character was accepted.
       replace ==:@:== by ==FOO==.
      *Non-COBOL character(s) were found starting with"@"in column 16.
      *  The characters were discarded.
           05 :@:-Var pic X.

      *Ko Non-COBOL character"€"was found in column 19.
      *The character was accepted.
       replace ==:€:== by ==FOO==.
           05 :€:-Var pic X.

      *Ko Non-COBOL character"£"was found in column 19.
      *The character was accepted.
       replace ==:£:== by ==FOO==.
           05 :£:-Var pic X.

      *Ko Non-COBOL character"#"was found in column 19.
      *The character was accepted.
       replace ==:#:== by ==FOO==.
           05 :#:-Var pic X.

      *Ko Non-COBOL character"é"was found in column 19.
      *The character was accepted.
       replace ==:é:== by ==FOO==.
           05 :é:-Var pic X.

      *Ko Non-COBOL character"&"was found in column 19.
      *The character was accepted.
       replace ==:&:== by ==FOO==.
           05 :&:-Var pic X.

      *Ko Non-COBOL character"["was found in column 19.
      *The character was accepted.
       replace ==:[:== by ==FOO==.
           05 :[:-Var pic X.
      *Ko Non-COBOL character"]"was found in column 19.
      *The character was accepted.
       replace ==:]:== by ==FOO==.
           05 :]:-Var pic X.
      *Ko Non-COBOL character"^"was found in column 19.
      *The character was accepted.
       replace ==:^:== by ==FOO==.
           05 :^ :-Var pic X.
      *Ko Non-COBOL character"{"was found in column 19.
      *The character was accepted.
       replace ==:{:== by ==FOO==.
           05 :{:-Var pic X.
      *Ko Non-COBOL character"}"was found in column 19.
      *The character was accepted.
       replace ==:}:== by ==FOO==.
           05 :}:-Var pic X.

       END PROGRAM CheckPseudoText1.

Expected behavior

Mandatory: check that characters inside a partial cobol word are valid.

Bonus : Same error as IBM compiler. IBM compiler sometimes seems to stop its parsing or try to find the end of the replace clause. We don't need to reproduce this behavior of course. It would be better to detect that the replace clause was intented to finish at a certain point and report that the replace is invalid.

Technical

  1. In CobolChar, add a new method IsAllowedInsidePartialCobolWord(char) which must return true if char is accepted by IsCobolWordChar, otherwise check if the character is in following list:

    (space)
    +
    *
    /
    =
    $
    ,
    ;
    .
    '
    "
    (
    )
    <
    >

    Note that : will not be accepted for now because we first need to fix #2309 to handle such cases.

  2. In ScannerUtils.CheckForPartialCobolWordPattern, replace call to CobolChar.IsCobolWordChar by CobolChar.IsAllowedInsidePartialCobolWord.

  3. Check impact on RegexReplace in ReplaceIterator.

How to test automatically

Standard unit test with Token comparator.

fm-117 commented 1 year ago

To add the diagnostic:

In Scanner.GetTokenStartingFrom, just before main switch

if (currentState.InsidePseudoText)
{
    if (!CobolChar.IsAllowedInsidePseudoText(line[startIndex]))
    {
        [Add diagnostic]
    }
}

The method CobolChar.IsAllowedInsidePseudoText has to be created, see how to factorize with what is allready done for this issue.

Bonus: the property IsPseudoText in Token class can be removed, it is not used.

fm-117 commented 1 year ago

We'll have to exclude '(' and ')' first because it won't work with https://github.com/TypeCobolTeam/TypeCobol/blob/develop/TypeCobol.Test/Parser/Preprocessor/ReplaceTestFiles/PgmReplaceFunction.cbl 😞