IBM / zopeneditor-about

IBM Z Open Editor: File issues here!
https://ibm.github.io/zopeneditor-about
Apache License 2.0
49 stars 20 forks source link

COBOL - Minor issues with literal delimiters in the COBOL textmate grammar #308

Closed FALLAI-Denis closed 1 year ago

FALLAI-Denis commented 1 year ago

Development environment used

Problem Description

Minor bugs in the handling of literal delimiters.

Some delimiters are not correctly recognized:

The issues are related to the textmate grammar associated with the COBOL language.

      "begin": "\"",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\"|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.double.cobol"
    },
    {
      "begin": "'",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "('|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.single.cobol"
    },
    {
      "begin": "([zZ]|[nN]|[uU])\"",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\"|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.double.cobol"
    },

Observed behavior

image

Expected behavior

image

I propose a simplification along with a correction:

    {
      "begin": "(?:[nNuU]?[xX]|[gGnNuUzZ])?(\")",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\\1|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.double.cobol"
    },
    {
      "begin": "(?:[nNuU]?[xX]|[gGnNuUzZ])?(')",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\\1|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.single.cobol"
    },

or only:

    {
      "begin": "(?:[nNuU]?[xX]|[gGnNuUzZ])?([\"'])",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\\1|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.cobol"
    },
phaumer commented 1 year ago

Thanks! Happy holidays!

FALLAI-Denis commented 1 year ago

Hi,

The ZOE COBOL grammar also includes the following rules which are not legal for the IBM Enterprise COBOL compiler, nor for the ISO standard:

    {
      "match": "([hH])'\\h*'",
      "name": "constant.numeric.integer.hexadecimal.cobol"
    },
    {
      "match": "([hH])'.*'",
      "name": "invalid.illegal.hexadecimal.cobol"
    },
    {
      "match": "([hH])\"\\h*\"",
      "name": "constant.numeric.integer.hexadecimal.cobol"
    },
    {
      "match": "([hH])\".*\"",
      "name": "invalid.illegal.hexadecimal.cobol"
    },
    {
      "match": "[oO]\"[0-7]*\"",
      "name": "constant.numeric.integer.octal.cobol"
    },
    {
      "match": "[oO]\".*\"",
      "name": "invalid.illegal.octal.cobol"
    },

On the other hand, the ISO standard provides for boolean literals with the prefixes B and BX.

In ISO standard, a string literal can be expressed with a continuation character - placed immediately after the closing quote or apostrophe delimiter, (the use of the - character in column 7 is deprecated).

The previous propositions could therefore become:

    {
      "begin": "(?:[bBnNuU]?[xX]|[bBgGnNuUzZ])?(\")",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\\1(-)?|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.double.cobol"
    },
    {
      "begin": "(?:[bBnNuU]?[xX]|[bBgGnNuUzZ])?(')",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\\1(-)?|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.single.cobol"
    },

or only:

    {
      "begin": "(?:[bBnNuU]?[xX]|[bBgGnNuUzZ])?([\"'])",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "(\\1(-)?|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.cobol"
    },

These additions to the COBOL textmate grammar also require enhancements to the COBOL Language Server.

image

phaumer commented 1 year ago

In 3.0.1 we only fixed the textmate. The Language Server fix will be in a future release.

FALLAI-Denis commented 1 year ago

Hi,

New rule in ZOE 3.0.1 :

    {
      "begin": "([bBnNuU]?[xX]|[bBgGnNuUzZ])?([\"'])",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "([\"']|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.cobol"
    },

The end part should match the begin part about quote or apostrophe: I suggest to use in end part a referback to begin part.

{
  "begin": "(?:[bBnNuU]?[xX]|[bBgGnNuUzZ])?([\"'])",
  "beginCaptures": {
    "0": {
      "name": "punctuation.definition.string.begin.cobol"
    }
  },
  "end": "(\\1|$)",
  "endCaptures": {
    "0": {
      "name": "punctuation.definition.string.end.cobol"
    }
  },
  "name": "string.quoted.cobol"
},
phaumer commented 1 year ago

Fixed some of these in Z Open Editor 3.1.0. Some items listed are not support yet by IBM COBOL compiler.

FALLAI-Denis commented 1 year ago

Sorry but still wrong for ending delimiter wich must match opening delimiter with a referback.

image

And regression an b/bx prefix:

image

All the syntaxes in the example above conform to the ISO COBOL standard.

ZOE 3.1.0 texmate rule is:

    {
      "begin": "([nNuU]?[xX]|[gGnNuUzZ])?([\"'])",
      "beginCaptures": {
        "0": {
          "name": "punctuation.definition.string.begin.cobol"
        }
      },
      "end": "([\"']|$)",
      "endCaptures": {
        "0": {
          "name": "punctuation.definition.string.end.cobol"
        }
      },
      "name": "string.quoted.cobol"
    },

Right textmate rule is:

        {
          "begin": "(?:[bBnNuU]?[xX]|[bBgGnNuUzZ])?([\"'])",
          "beginCaptures": {
            "0": {
              "name": "punctuation.definition.string.begin.cobol"
            }
          },
          "end": "\\1(-)?|$",
          "endCaptures": {
            "0": {
              "name": "punctuation.definition.string.end.cobol"
            }
          },
          "name": "string.quoted.cobol"
        },
FALLAI-Denis commented 1 year ago

Hi @phaumer

I suggest to reopen this issue.

sdaimwood commented 1 year ago

Hi @FALLAI-Denis,

Thank you for the contributions here, we intend to fix that textmate error with the nested double quote or single quote.

However for the b and bx tokens, we reverted that change intentionally as we spoke with the IBM COBOL compiler team and they informed us that although that syntax is a part of the COBOL ISO Standard, the IBM COBOL on Z compiler does not support it. Our intention with the parser and syntax highlighting is to match as closely as possible to the IBM compiler so users are able to see the error prior to compiling.

FALLAI-Denis commented 1 year ago

Hi,

Ok for b an bx. I hope boolean format will soon be available in IBM Enterprise COBOL compiler.

Perhaps same for - continuation delimiter at end of string.

The textmate rule would then be:

        {
          "begin": "(?:[nNuU]?[xX]|[gGnNuUzZ])?([\"'])",
          "beginCaptures": {
            "0": {
              "name": "punctuation.definition.string.begin.cobol"
            }
          },
          "end": "\\1|$",
          "endCaptures": {
            "0": {
              "name": "punctuation.definition.string.end.cobol"
            }
          },
          "name": "string.quoted.cobol"
        },
phaumer commented 1 year ago

Made the updates discussed in v3.1.1.