emacs-tree-sitter / elisp-tree-sitter

Emacs Lisp bindings for tree-sitter
https://emacs-tree-sitter.github.io
MIT License
816 stars 73 forks source link

imenu: Add imenu functionality to tree-sitter #199

Open mohkale opened 2 years ago

mohkale commented 2 years ago

WIP imenu implementation for tree-sitter.

This works pretty much exactly like the highlighting functionality. We have a separate patterns file for querying imenu entries and a (not enabled by default) minor mode for forcing tree-sitter to be used for imenu.

mohkale commented 2 years ago

@ubolonton

What I have submitted now works, but depending on the query it allows duplicate imenu entries with separate capture-scopes. For example:

@dataclasses.dataclass
class URL(object):
    url: str
    protocol: str = ""
    target: str = ""
    mimetype: str = ""
    desktop_file: str = None
    foo, bar = [1, 2]

    def foo(self):
        def bar():
            print('bar')
            print('hello world')

def foo(self):
    print('hello world')

We'll end up with 3 imenu entries for foo using the patterns file here. One for the method foo and two function foo since they overlap. Is there a way to prevent this from the query directly? I could also just ignore captures which have the same point as an earlier capture (should be relatively simple) but I'd rather avoid that if possible.

Also I'd like to be able to build the imenu recursively by enumerating through the buffer tree. For example here's the imenu response built by eglot using python language server. Notice how Class > URL works, but also all the fields of that class are also grouped into that class (Field > URL > url, Field > URL > protocol for example). Is there a way where I can run a query, and then beginning from the root node, walk through the tree, record the value of intermittent nodes (such as class) while deriving further nodes such as fields within that class? That would make this much more powerful and more importantly resillient. It would also avoid the issue with functions also being methods because I can only focus on the first capture at that node and skip later captures (adding a sort of precedence to the queries). Thoughts?

(("Class"
  ("URL"
   [...]
   #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>))
 ("Field"
  ("URL"
   ("url"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("protocol"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("target"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("mimetype"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("desktop_file"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)
   ("bar"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)))
 ("Method"
  ("URL"
   ("foo"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)))
 ("Function"
  ("foo"
   ("bar"
    [...]
    #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>))
  ("foo"
   [...]
   #<subr F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_143>)))
mohkale commented 2 years ago

Okay, I tried implementing a recursive menu function using tsc-mapc-children but it won't work because the tree isn't consistent structured in a way that's easy to recurse. For example the python tree is:

module:
  decorated_definition:
    decorator:
      attribute:
        identifier:
        identifier:
    class_definition:
      identifier:
      argument_list:
        identifier:
      block:
        class_definition:
          identifier:
          argument_list:
            identifier:
          block:
            pass_statement:

The identifier, which we're building the entry from, is nested as a sibling of the body of the definition, not a parent. This means there's no smart way (that I can think of) to reliably associate a class X with a field in that class y, at least without assuming the structure of the tree.

mohkale commented 2 years ago

Okay, I've done something that's borderline horrific :fearful:, but it does give a much more powerful imenu index.

Screenshot_20211218_201759

Essentially we do 2 things:

  1. Delete duplicate nodes in the query response. If two different patterns match the same node, we delete the later one and prioritise the earlier one.
  2. If a pattern captures more than one node, we nest into that node. That's how, in this case, the field names are all prefixed by the class in which their defined.

Warnings related to point 2: I'm not sure how tree-sitter orders multiple captures in response to a query. Hopefully it just matches the order in which it is declared in which case we can deterministically format each capture. Worst case we might end up with a class within a field "foo Foo" which completely ruins this experiment :cry:.

For anyone wanting to try this out here's an updated patterns file for python.

(class_definition
 name: (identifier) @Class
 body: (block
        [(function_definition name: (identifier) @Method)
         (class_definition name: (identifier) @Class)
         (expression_statement
          (assignment
           left: [(identifier) @Field
                  (pattern_list (identifier) @Field)]))
         ]))

(class_definition name: (identifier) @Class)

(expression_statement
 (assignment
  left: [(identifier) @Variable
         (pattern_list (identifier) @Variable)]))

(function_definition
 name: (identifier) @Function)
ethan-leba commented 2 years ago

This looks cool! I'm not sure what the benefit of including this directly into the tree-sitter library vs providing your own package though. Do you have any reasons?

mohkale commented 2 years ago

@ethan-leba

The end goal of tree sitter, I believe, is to be integrated into emacs core and eventually replace the old regexp based syntax highlighting and completion implementations. That's why, since imenu is a core part of that, that I think it should be integrated into tree-sitter proper. Plus this followed from an open issue ticket for adding imenu support so I think it fits. Plus if I move it to a separate package I'd have to reimplement the lookup for imenu query files just like we have for highlighting.

All things considered, I think there's a strong case for having this in tree sitter itself.

ubolonton commented 2 years ago

Thanks for looking into this! I haven't had time to check it out yet. I'll try to do so later this week.

Dealing with recursion is definitely the hardest part of designing the imenu integration IMO.