fwcd / tree-sitter-kotlin

Kotlin grammar for Tree-sitter
https://fwcd.github.io/tree-sitter-kotlin
MIT License
126 stars 52 forks source link

Add field names #106

Closed nicolas-guichard closed 11 months ago

nicolas-guichard commented 11 months ago

Hi,

This is a continuation of /pull/74 given Vladimir Reshetnikov said they would not be pursuing it anymore.

In Searchfox, we leverage tree-sitter to add some information on where symbols are found. For instance if you search for the C++ symbol mozilla::dom::MediaControlKeySource::IsOpened, and you find 9 results in the file dom/mediacontrol/MediaControlKeyManager.cpp, it's super useful to know in which function each result was found (tree-sitter is what enables those // found in comments). We are adding Kotlin support to Searchfox, hence this PR.

For now we only use the name and body fields of class_definition and function_declaration, but since there was a pull request hanging around to add named fields everywhere I just rebased it on main. If you think it's too much (this adds almost 2M lines to parser.c!), I can reduce the number of named fields. They could be useful to someone else though.

VladimirMakaev commented 11 months ago

@nicolas-guichard tests are failing on CI

nicolas-guichard commented 11 months ago

Indeed, I found the reason for the test failure:

When _type is defined as:

_type: $ => seq(
  optional($.type_modifiers),
  choice(
    $.parenthesized_type,
    $.nullable_type,
    $._type_reference,
    $.function_type,
    $.not_nullable_type
  )
)

then @Foo T & F is parsed as:

_type                @Foo T & F
  type_modifiers     @Foo
    annotation       @
      user_type       Foo
  not_nullable_type       T & F
    user_type             T
    user_type                 F

But when _type uses a named field around the choice:

_type: $ => seq(
  optional($.type_modifiers),
  field('type', choice(
    $.parenthesized_type,
    $.nullable_type,
    $._type_reference,
    $.function_type,
    $.not_nullable_type
  ))
)

then @Foo T & F is parsed as:

_type                @Foo T & F
  not_nullable_type  @Foo T & F
    type_modifiers   @Foo
      annotation     @
        user_type     Foo
    user_type             T
    user_type                 F

I don't really understand why adding a named field changes the precedence here.

types.txt explicitly tests for the first one, so I assume that's what we want, although the Kotlin grammar is quite surprising here: not_nullable_type is only ever used by type, and type already has an optional type_modifiers, so why does not_nullable_type need one too if it doesn't even have precedence?

I've removed the named field as a workaround for now, but I'm wondering if it would make sense to remove the optional type_modifiers from not_nullable_type to make the grammar less ambiguous.

VladimirMakaev commented 11 months ago

@nicolas-guichard This is a bit concerning as we might potentially miss a similar test case. I guess we can't name fields of different types without creating a wrapping node.

I've looked at this issue where field names were introduced. https://github.com/tree-sitter/tree-sitter/pull/271 One thing to note is that tree-sitter parser <file> will return a tree with all field names.

What I would suggest doing is instead of adding all possible fields right now. Start smaller instead with only a couple of fields you need BUT additionally updating unit tests to verify fields are correctly named in every single place.

nicolas-guichard commented 11 months ago

I'm mainly interested with elements that can have an inner block, so I've done those (+ type_alias because every other child of _declaration had named fields so I thought it made sense for “symmetry”).

This is split into commits that should each pass the tests for easier reviewing, but, given that parser.c is committed into the repo I think it would be wise to squash everything before merging, or extracting the parser.c update into a separate commit, to avoid needlessly increasing the repo size.

VladimirMakaev commented 11 months ago

@nicolas-guichard I'm not concerned about splitting this in multiple commits since it'll get squashed anyway. Thanks for adding the tests! If you can rebase on master I'll re-run tests and merge

fwcd commented 6 months ago

We really need to get the parser size under control again... this PR bumped it from 22 MB to 70 MB. The Git CLI warns about it too:

remote: warning: File src/parser.c is 70.42 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB        
remote: warning: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com./        
fwcd commented 6 months ago

I've reverted this PR for now. Feel free to rebase and open a new PR, but please make sure to keep the parser at a reasonable size, ideally < 30 MB, and also that the CI checks pass (specifically https://github.com/fwcd/tree-sitter-kotlin/issues/92).