emacs-tree-sitter / elisp-tree-sitter

Emacs Lisp bindings for tree-sitter
https://emacs-tree-sitter.github.io
MIT License
816 stars 73 forks source link

make-range! predicate #180

Open dvzubarev opened 2 years ago

dvzubarev commented 2 years ago

Hi, There is make-range! predicate implemented in nvim-treesitter here. It is used for implementing some evil text objects: example. Those queries don't work in emacs implementation of evil text objects. I'd like to try to add it. Any tips where to start?

ubolonton commented 2 years ago

Hi, There is make-range! predicate implemented in nvim-treesitter here. It is used for implementing some evil text objects: example. Those queries don't work in emacs implementation of evil text objects.

Can you describe what this predicate does? It's not immediately clear from looking at the implementation and the example.

I'd like to try to add it. Any tips where to start?

The currently supported predicates (eq?, not-eq?, match?, not-match?) are implemented at the layer of the Rust crate tree-sitter, by the function satisfies_text_predicates. Additional custom predicates should be implemented at the elisp-tree-sitter layer, in query.rs, by additional processing on top of the results returned by cursor.captures() and cursor.matches().

The instructions for local dev setup are here.

theHamsta commented 2 years ago

@ubolonton The directive takes to nodes and creates a data structure describing the range from the beginning of the first node to the end the second node and stores them as meta data next to the query result. It is currently not used out side of nvim-treesitter-textobjects. The data structure has API compatibility in Lua with regular tree-sitter nodes.

You could allow users to inspect unknown patterns in the query results they are getting or allow them to register their own predicates or directives to post-process the query results once they are exposed to emacs Lisp. In Neovim, such user-defined directives or predicates can be registered via Lua functions and are directly applied onto the query results (e. g. here https://github.com/theHamsta/neovim/blob/acacf5151bb9d6d4b8fc0f0ba6a1a6cccaaa0b4f/runtime/lua/vim/treesitter/query.lua#L414-L429)

@dvzubarev Really cool project!

meain commented 2 years ago

I spent some time today on this, but was not really able to fully figure out how I should approach this. I don't think my understanding of the tree-sitter lib is anywhere close to useful yet 🤷🏼‍♂️ . @ubolonton , just wanted to check if you had some pointers on how I should approach this or resources I can refer.

theHamsta commented 2 years ago

@meain please not that make-range! was created just for the needs of nvim-treesitter-textobjects since there is no obvious way to select multiple nodes using tree-sitter queries. It was not adopted in upstream neovim.

meain commented 2 years ago

Thanks @theHamsta , I did take a look at the implementation in nvim-treesitter-textobjects. Unfortunately elisp-tree-sitter as of now does not expose anything that we can use to add "directive_handlers" nor does it expose the patterns in the query into elisp if I understand correctly.

ubolonton commented 2 years ago

I briefly looked more into nvim-treesitter's make-range!.

It seems to me that, for each pattern, the data extraction logic depends on the capture names, not the structure of the pattern. For example:

(#make-range! "parameter.outer" @_start @parameter.inner)
(#make-range! "parameter.outer" @parameter.inner @_end)

If that's the case, the predicates are boilerplate which can be eliminated by specifying that logic at the level of the text-object library. This is a REPL snippet that illustrates the idea:

(with-current-buffer "example.py"
  (seq-map (lambda (match)
             (pcase-let* ((`(,_ . ,captures) match)
                          (captures (seq-into captures 'list))
                          (start (map-elt captures '_start))
                          (inner (map-elt captures 'parameter.inner)))
               (cons (tsc-node-start-position start)
                     (tsc-node-end-position inner))))
           (tree-sitter-debug-query
            "(parameters
    \",\" @_start .
    [
      (identifier)
      (tuple)
      (typed_parameter)
      (default_parameter)
      (typed_default_parameter)
      (dictionary_splat_pattern)
      (list_splat_pattern)
    ] @parameter.inner)"
            :matches)))

Note: Making short-lived node objects to retrieve their properties puts a lot of stress on the GC, but that's an orthogonal discussion. tree-sitter-hl uses an internal API which avoids that. Such approach can be generalized. The results will probably look similar to the new tree-traversal APIs.

(The snippet above uses tree-sitter-debug-query to quickly illustrate the idea, libraries should use tsc APIs.)

meain commented 2 years ago

Thanks @ubolonton , just had a few queries. I was not able to find out where/if we are exposing general_predicates which I believe would help with figuring out these "unhanded" items.

This might be a dumb question, but should I be worrying about scoping the make-range queries? For example in something like below, only the first (_start + parameter.inner) should be picked up for parameter.outer if I understand correctly.

((parameters
    "," @_start .
    [
      (identifier)
      (tuple)
    ] @parameter.inner
  )
  (#make-range! "parameter.outer" @_start @parameter.inner))

((something-else  ; <-- not parameteres
    "," @_start .
    [
      (identifier)
      (tuple)
    ] @parameter.inner
  )
  (#make-range! "parameter.middle" @_start @parameter.inner)) ; <-- this is not parameter.outer
ubolonton commented 2 years ago

I was not able to find out where/if we are exposing general_predicates which I believe would help with figuring out these "unhanded" items.

They are not exposed currently. Can you explain how they would help?

should I be worrying about scoping the make-range queries? For example in something like below, only the first (_start + parameter.inner) should be picked up for parameter.outer if I understand correctly.

((parameters
    "," @_start .
    [
      (identifier)
      (tuple)
    ] @parameter.inner
  )
  (#make-range! "parameter.outer" @_start @parameter.inner))

((something-else  ; <-- not parameteres
    "," @_start .
    [
      (identifier)
      (tuple)
    ] @parameter.inner
  )
  (#make-range! "parameter.middle" @_start @parameter.inner)) ; <-- this is not parameter.outer

Do you have a concrete example for this? AFAICT, text objects don't need that generality. They have only inner and outer.

meain commented 2 years ago

Do you have a concrete example for this? AFAICT, text objects don't need that generality. They have only inner and outer.

I don't really have a concrete example, was just wondering if this would be something that I would have to handle.

They are not exposed currently. Can you explain how they would help?

As of now I am just reading the queries file which is pulled from the nvim-treesitter-textobjects package directly to tsc-make-query and was planning to keep this flow as is if possible. I am guessing I would have to parse this info(make-range arguments) out in elisp otherwise.

ubolonton commented 2 years ago

Do you have a concrete example for this? AFAICT, text objects don't need that generality. They have only inner and outer.

I don't really have a concrete example, was just wondering if this would be something that I would have to handle.

Looks like no. Please let me know when you encounter an example otherwise.

They are not exposed currently. Can you explain how they would help?

As of now I am just reading the queries file which is pulled from the nvim-treesitter-textobjects package directly to tsc-make-query and was planning to keep this flow as is if possible. I am guessing I would have to parse this info(make-range arguments) out in elisp otherwise.

Exposing that function isn't difficult. I'm trying to understand how that information would help with the text-object use case.

meain commented 2 years ago

Exposing that function isn't difficult. I'm trying to understand how that information would help with the text-object use case.

This might be specific to my usecase, but I pretty much load a queries file as is with all the text-objects in meain/evil-textobj-tree-sitter. Is there some way I could parse out just the predicates alone without that to compute the start and end for them separately?

ubolonton commented 2 years ago

I understand that you want to extract the arguments to the make-range! predicate, for each pattern. My question is, why.

From the look of them, the predicates look redundant and don't add information, unless nvim-treesitter-textobjects doesn't have rules to constraint the capture names. (In which case that would be a potential improvement to that project.)

meain commented 2 years ago

Maybe I am missing something here but the definition of parameter.outer is just purely in make-range.

((parameters
    "," @_start .
    [
      (identifier)
      (tuple)
      (typed_parameter)
      (default_parameter)
      (typed_default_parameter)
      (dictionary_splat_pattern)
      (list_splat_pattern)
    ] @parameter.inner
  )
  (#make-range! "parameter.outer" @_start @parameter.inner))

Or are you saying that I don't really have to look into the make-range but rather just compute start and end from first and last item in a match entry? If this the the case, I would still need to get make-range as I will need to pull out the name of the object from that.

meain commented 2 years ago

On a related note, I tried grouping sibling nodes but I'm not sure if I am doing it right. @parameter.outer is just matching the comma. This is the same in tree-sitter playground (image) as well.

2022-02-28-11-22-55

I was thinking that maybe I could rewrite the queries this way if this is a feasible option

meain commented 2 years ago

@theHamsta I remember you mentioning that tree-sittter does not have an obvious way to select multiple items. I just saw this issue on on the nvim side and was wondering if grouping sibling nodes is not possible/intended to be used this way.

theHamsta commented 2 years ago

Grouping sibling nodes only allows you to specify a order of the nodes. It would allow you to set a capture on each and every of the mentioned subnodes on which you could create a union on once you iterate over all the subnodes which is similar to the start/end patternas queries only can return nodes. The motivation to use custom directives here was that the actual plugin does need to handle even more and more special ways how you describe ranges but that you would just register a new Post-Processing function that would work on all queries that would work on all query users. Also users can define their own predicates and directives which would only be used in their config.

The make-range! is a bad example as it's kind of deprecated as in the end upstream Neovim uses a different way to represent query results. The result of make-range! is a range that can ducktype a normal node and thus be handled by applications without code change. There's also a bug that currently prevents Neovim to return multiple nodes with the same capture from (_)* @foo. So make-range! could be avoided by just usong the capture you want to refer to multiple times with the semantic of combining the node ranges. In refactor of nvim-treesitter-textobjects we might use this to avoid make-range!.

theHamsta commented 2 years ago

Other comment, in the original issue commenf the implemention of make-range is linked. This is kind of misleading if how predicates/directives work today in Neovim. This legacy predicate is still hard-coded in our plugin. Modern predicates/directives can be registered by a Neovim API function and don't require changes to a plugins code.

ubolonton commented 2 years ago

Or are you saying that I don't really have to look into the make-range but rather just compute start and end from first and last item in a match entry? If this the the case, I would still need to get make-range as I will need to pull out the name of the object from that.

Not the first and last captures, but specifically-named captures. A text object x can be inner/outer, and can be located by either a node, or a pair of start/end nodes, so there are a maximum of 6 names:

If the text-object library imposes this constraint, the make-range! predicate becomes redundant. That's the improvement I was suggesting nvim-treesitter-textobjects can make.

ubolonton commented 2 years ago

tree-sittter does not have an obvious way to select multiple items

IIUC, this is (was?) an issue with the NeoVim binding. In this ELisp binding, we don't have that issue, since we have tsc-query-matches.

ubolonton commented 2 years ago

in the end upstream Neovim uses a different way to represent query results.

@theHamsta I'm curious, what does it use now?

meain commented 2 years ago

If the text-object library imposes this constraint, the make-range! predicate becomes redundant. That's the improvement I was suggesting nvim-treesitter-textobjects can make.

Ahh, this makes sense.

theHamsta commented 2 years ago

This is the PR which would fix that we can have multiple nodes per capture and match: https://github.com/neovim/neovim/pull/17099.

Right now you can register predicates and directives: https://github.com/theHamsta/neovim/blob/3b0a0c6ca6c5f4e719028be2dba11853ffec6b6b/runtime/lua/vim/treesitter/query.lua#L347-L371 they work then directly on the match objects https://github.com/theHamsta/neovim/blob/3b0a0c6ca6c5f4e719028be2dba11853ffec6b6b/runtime/lua/vim/treesitter/query.lua#L569 all other predicates and directives are registered in our plugin https://github.com/nvim-treesitter/nvim-treesitter/blob/f735498a645e1a2aca7a0cfdaa2d7f8cec543846/lua/nvim-treesitter/query_predicates.lua make-range! works on a different data structure this is why it has no implementation that works directly on the match https://github.com/nvim-treesitter/nvim-treesitter/blob/f735498a645e1a2aca7a0cfdaa2d7f8cec543846/lua/nvim-treesitter/query_predicates.lua#L103-L102

There are also directives implemented in the core for languages injection which can trim characters for language injection they mostly just calculate meta data. https://github.com/theHamsta/neovim/blob/3b0a0c6ca6c5f4e719028be2dba11853ffec6b6b/runtime/lua/vim/treesitter/query.lua#L307-L345 So our directive mostly just attach some Lua objects to the match that can be used by downstream applications.

meain commented 2 years ago

@theHamsta How exactly is @function.outer.start used here. Is it merged with @function.outer in lua side? I was not able to find anything specific from a quick scan in the codebase.

theHamsta commented 2 years ago

@function.outer.start are optional decorations for the textobject like doctrings or template declarations. They will be added to the textobject range if present.

meain commented 2 years ago

The original issue of make-range has been taken care of in downstream (https://github.com/meain/evil-textobj-tree-sitter/pull/38) . Ended up rewriting the queries to have <node>._start and <node>._end and merging it in elisp. We can close this issue if necessary, or maybe rename it to be a general discussion around how to implement custom predicates as there is some useful discussion here.