bx7lb / hackiphone.github.io

0 stars 0 forks source link

### Problem 1 #1

Open greyaxie opened 9 months ago

greyaxie commented 9 months ago

Problem

Consider this minimal grammar

module.exports = grammar({
  name: "testing",

  extras: $ => [
    /\s/
  ],

  rules: {
    program: $ => repeat($._expression),

    double_colon: $ => prec.left(seq(
      field("lhs", $.identifier), 
      field("operator", "::"), 
      optional(field("rhs", $.identifier))
    )),

    identifier: $ => /[\p{XID_Start}.][\p{XID_Continue}.]*/,

    _expression: $ => choice(
      $.identifier,
      $.double_colon
    )
  }
});

Here I'm attempting to make it so that both foo::bar and foo:: get recognized as double_colon tokens.

Weirdly, if I run tree-sitter parse on a file that just contains foo::bar then I get:

(program [0, 0] - [0, 8]
  (double_colon [0, 0] - [0, 5]
    lhs: (identifier [0, 0] - [0, 3]))
  (identifier [0, 5] - [0, 8]))

with this debug output

new_parse
process version:0, version_count:1, state:1, row:0, col:0
lex_internal state:0, row:0, column:0
  consume character:'f'
  consume character:'o'
  consume character:'o'
lexed_lookahead sym:identifier, size:3
shift state:4
process version:0, version_count:1, state:4, row:0, col:3
lex_internal state:0, row:0, column:3
  consume character:':'
  consume character:':'
lexed_lookahead sym:::, size:2
shift state:5
process version:0, version_count:1, state:5, row:0, col:5
lex_internal state:0, row:0, column:5
  consume character:'b'
  consume character:'a'
  consume character:'r'
lexed_lookahead sym:identifier, size:3
reduce sym:double_colon, child_count:2
shift state:4
process version:0, version_count:1, state:4, row:0, col:8
lex_internal state:0, row:0, column:8
lexed_lookahead sym:end, size:0
reduce sym:_expression, child_count:1
reduce sym:program_repeat1, child_count:2
reduce sym:program, child_count:1
accept
done

That seemed weird to me. It seems to have something to do with the combination of the optional() and prec.left(). If you use prec.right() or remove the optional() then you get what I was expecting:

(program [0, 0] - [0, 8]
  (double_colon [0, 0] - [0, 8]
    lhs: (identifier [0, 0] - [0, 3])
    rhs: (identifier [0, 5] - [0, 8])))

Could someone please help me understand what's happening here? Thanks.

Steps to reproduce

See problem

Expected behavior

See problem

Tree-sitter version (tree-sitter --version)

0.21.0

Operating system/version

macOS

Originally posted by @DavisVaughan in https://github.com/tree-sitter/tree-sitter/issues/3110

Originally posted by @greyaxie in https://github.com/xtrime-ru/TelegramApiServer/issues/152

Originally posted by @greyaxie in https://github.com/microsoft/vscode-docs/issues/7101

greyaxie commented 9 months ago

Problem

Consider this minimal grammar


module.exports = grammar({

  name: "testing",

  extras: $ => [

    /\s/

  ],

  rules: {

    program: $ => repeat($._expression),

    double_colon: $ => prec.left(seq(

      field("lhs", $.identifier), 

      field("operator", "::"), 

      optional(field("rhs", $.identifier))

    )),

    identifier: $ => /[\p{XID_Start}.][\p{XID_Continue}.]*/,

    _expression: $ => choice(

      $.identifier,

      $.double_colon

    )

  }

});

Here I'm attempting to make it so that both foo::bar and foo:: get recognized as double_colon tokens.

Weirdly, if I run tree-sitter parse on a file that just contains foo::bar then I get:


(program [0, 0] - [0, 8]

  (double_colon [0, 0] - [0, 5]

    lhs: (identifier [0, 0] - [0, 3]))

  (identifier [0, 5] - [0, 8]))

with this debug output


new_parse

process version:0, version_count:1, state:1, row:0, col:0

lex_internal state:0, row:0, column:0

  consume character:'f'

  consume character:'o'

  consume character:'o'

lexed_lookahead sym:identifier, size:3

shift state:4

process version:0, version_count:1, state:4, row:0, col:3

lex_internal state:0, row:0, column:3

  consume character:':'

  consume character:':'

lexed_lookahead sym:::, size:2

shift state:5

process version:0, version_count:1, state:5, row:0, col:5

lex_internal state:0, row:0, column:5

  consume character:'b'

  consume character:'a'

  consume character:'r'

lexed_lookahead sym:identifier, size:3

reduce sym:double_colon, child_count:2

shift state:4

process version:0, version_count:1, state:4, row:0, col:8

lex_internal state:0, row:0, column:8

lexed_lookahead sym:end, size:0

reduce sym:_expression, child_count:1

reduce sym:program_repeat1, child_count:2

reduce sym:program, child_count:1

accept

done

That seemed weird to me. It seems to have something to do with the combination of the optional() and prec.left(). If you use prec.right() or remove the optional() then you get what I was expecting:


(program [0, 0] - [0, 8]

  (double_colon [0, 0] - [0, 8]

    lhs: (identifier [0, 0] - [0, 3])

    rhs: (identifier [0, 5] - [0, 8])))

Could someone please help me understand what's happening here? Thanks.

Steps to reproduce

See problem

Expected behavior

See problem

Tree-sitter version (tree-sitter --version)

0.21.0

Operating system/version

macOS

Originally posted by @DavisVaughan in https://github.com/tree-sitter/tree-sitter/issues/3110

Originally posted by @greyaxie in https://github.com/xtrime-ru/TelegramApiServer/issues/152

Originally posted by @greyaxie in https://github.com/microsoft/vscode-docs/issues/7101

why