atom / language-go

Go language package for Atom
Other
106 stars 65 forks source link

Add support for changes to number literals introduced in Go 1.13 #175

Closed tohjustin closed 5 years ago

tohjustin commented 5 years ago

Requirements

The updated grammar should support syntax highlighting for the following:

  • Binary integer literals: The prefix 0b or 0B indicates a binary integer literal such as 0b1011.
  • Octal integer literals: The prefix 0o or 0O indicates an octal integer literal such as 0o660. The existing octal notation indicated by a leading 0 followed by octal digits remains valid.
  • Hexadecimal floating point literals The prefix 0x or 0X may now be used to express the mantissa of a floating-point number in hexadecimal format such as 0x1.0p-1021. A hexadecimal floating-point number must always have an exponent, written as the letter p or P followed by an exponent in decimal. The exponent scales the mantissa by 2 to the power of the exponent.
  • Imaginary literals: The imaginary suffix i may now be used with any (binary, decimal, hexadecimal) integer or floating-point literal.
  • Digit separators: The digits of any number literal may now be separated (grouped) using underscores, such as in 1_000_000, 0b_1010_0110, or 3.1415_9265. An underscore may appear between any two digits or the literal prefix and the first digit.

(taken from Go 1.13 Release Notes)

Description of the Change

All the regular expressions are manually converted from the EBNF (Extended Backus-Naur Form) expressions for each number literal type found in The Go Programming Language Specification:

Playground containing code snippet to test with: https://play.golang.org/p/IuWyvmHjHs0


EBNF to Regular Expression Conversion

Decimal literals

Type EBNF Regex
decimal_digit "0" … "9" . /\d/
decimal_digits decimal_digit { [ "_" ] decimal_digit } . /\d(_?\d)*/
decimal_lit "0" \| ( "1" … "9" ) [ [ "_" ] decimal_digits ] . /0\|[1-9](_?\d(_?\d)*)?/

Binary literals

Type EBNF Regex
binary_digit "0" \| "1" . /[01]/
binary_digits binary_digit { [ "_" ] binary_digit } . /[01](_?[01])*/
binary_lit "0" ( "b" \| "B" ) [ "_" ] binary_digits . /0[bB]_?[01](_?[01])*/

Octal literals

Type EBNF Regex
octal_digit "0" … "7" . /[0-7]/
octal_digits octal_digit { [ "_" ] octal_digit } . /[0-7](_?[0-7])*/
octal_lit "0" [ "o" \| "O" ] [ "_" ] octal_digits . /0[oO]?_?[0-7](_?[0-7])*/

Hexadecimal literals

Type EBNF Regex
hex_digit "0" … "9" \| "A" … "F" \| "a" … "f" . /[\da-fA-F]/
hex_digits hex_digit { [ "_" ] hex_digit } . /[\da-fA-F](_?[\da-fA-F])*/
hex_lit "0" ( "x" \| "X" ) [ "_" ] hex_digits . /0[xX]_?[\da-fA-F](_?[\da-fA-F])*/

Decimal floating-point literals

Type EBNF Regex
decimal_exponent ( "e" \| "E" ) [ "+" \| "-" ] decimal_digits . /[eE][+-]?\d(_?\d)*/
decimal_float_lit (I) decimal_digits "." [ decimal_digits ] [ decimal_exponent ] . /\d(_?\d)*\.(\d(_?\d)*)?([eE][+-]?\d(_?\d)*)?/
decimal_float_lit (II) decimal_digits decimal_exponent . /\d(_?\d)*[eE][+-]?\d(_?\d)*/
decimal_float_lit (III) "." decimal_digits [ decimal_exponent ] . /\.\d(_?\d)*([eE][+-]?\d(_?\d)*)?/

Hexadecimal floating-point literals

Type EBNF Regex
hex_exponent ( "p" \| "P" ) [ "+" \| "-" ] decimal_digits . /[pP][+-]?\d(_?\d)*/
hex_mantissa (I) [ "_" ] hex_digits "." [ hex_digits ] . /_?[\da-fA-F](_?[\da-fA-F])*\.([\da-fA-F](_?[\da-fA-F])*)?/
hex_mantissa (II) [ "_" ] hex_digits . /_?[\da-fA-F](_?[\da-fA-F])*/
hex_mantissa (III) "." hex_digits . /\.[\da-fA-F](_?[\da-fA-F])*/
hex_float_lit "0" ( "x" \| "X" ) hex_mantissa hex_exponent . /0[xX](_?[\da-fA-F](_?[\da-fA-F])*\.([\da-fA-F](_?[\da-fA-F])*)?\|_?[\da-fA-F](_?[\da-fA-F])*\|\.[\da-fA-F](_?[\da-fA-F])*)[pP][+-]?\d(_?\d)*/

Imaginary literals

Since the imaginary suffix i may now be used with any (binary, decimal, hexadecimal) integer or floating-point literal, we just need to append an i to the various set of regular expressions above.

There's only a slight difference in the pattern for imaginary literals (with octal integer literal), which we modify it to not collide with the old imaginary literals pattern (pre Go 1.12):

  {
    'comment': 'Imaginary literals (with octal integer literal)'
-   'match': '(^|(?<=[^_.\\da-zA-Z]))0[oO]?_?[0-7](_?[0-7])*i(?=[^_.\\da-zA-Z])'
+   'match': '(^|(?<=[^_.\\da-zA-Z]))0[oO]_?[0-7](_?[0-7])*i(?=[^_.\\da-zA-Z])'
    'name': 'constant.numeric.imaginary.octal-integer.go'
  }
  ...
+ {
+   'comment': 'Imaginary literals (for backwards compatibility)'
+   'match': '(^|(?<=[^_.\\da-zA-Z]))0+(_?\\d)*i(?=[^_.\\da-zA-Z])'
+   'name': 'constant.numeric.imaginary.backwards-compatibility.go'
+ }

Alternate Designs

Benefits

Patterns & scopes have been broken down by each number literal types for the following benefits:

Possible Drawbacks

This change introduces the following scopes:

The only "breaking" change I'm aware is that imaginary literal expressions currently matched under constant.numeric.integer would now be matched with scopes prefixed with constant.numeric.imaginary.*.

But as far as I know, majority of the color themes don't use anything beyond constant.numeric: https://github.com/search?l=Less&q=syntax--integer&type=Code

Applicable Issues

Fixes #174 Related https://github.com/tree-sitter/tree-sitter-go/issues/31 (PR: https://github.com/tree-sitter/tree-sitter-go/pull/32)

rsese commented 5 years ago

Thanks for the contribution!

As you may have heard, we are migrating from our old first-mate grammar engine to the new Tree-sitter engine. This will enable a number of new features, more consistent syntax highlighting, and better performance, among other benefits. In order to free up our limited resources, we have decided to stop maintaining the first-mate grammar when there is a built-in Tree-sitter grammar available.

I see that you also opened a pull request in the Tree-sitter Go repository that was merged so thank you for that :bow: but since these changes are specific to the TextMate grammar, we'll close this out. If I misunderstood anything however, please let us know.

tohjustin commented 5 years ago

No worries @rsese, really appreciate the detailed response!

On hindsight, I should have waited for a response to #174 before working on a PR... (But actually the main reason for updating the grammar over here is b/c microsoft/vscode is sort of using this project as an upstream source for its Go TextMate grammar 😅)

Sidenote: Do you think it might be a good idea to add a note about maintenance status of the first-mate grammar on the README.md?

rsese commented 5 years ago

No problem at all and thanks for your understanding :bow:

Sidenote: Do you think it might be a good idea to add a note about maintenance status of the first-mate grammar on the README.md?

Hmm that's a good point or maybe in the issue templates? I'll mention this to the other maintainers, thanks for the suggestion :+1: