add tags.scm queries - Githubissues

the-mikedavis commented 2 years ago

tags.scm powers GitHub code navigation for languages that use tree-sitters.

For example with ruby...

You can see class/method declarations: ![ruby](https://user-images.githubusercontent.com/21230295/146465798-557badd8-c89f-4f1c-99e3-06e558779cc0.png) And see the references (where those classes/methods end up being used): ![ruby](https://user-images.githubusercontent.com/21230295/146465889-61faa1b3-3bee-4e04-be30-194dae11cdc5.png)

These queries ended up taking inspiration from other tags.scm in the @tree-sitter org repos, mostly python and ruby.

~I don't think there's a framework in place for testing these queries, but I think it wouldn't be too much trouble to write a small shell script that diffs the output of tree-sitter tags fixture.ex against a known solution for some fixtures. There is the ability to test queries like so: tree-sitter query --test query/tags.scm test/query/fixture.ex (added in tree-sitter#775) but the precedence rules seem to be backward between querying and determining tags (e.g. module/function definitions end up as @reference.calls in the query output but are correctly identified by tree-sitter tags). I can look at making a PR to tree-sitter that adds a flag for testing tree-sitter tags or maybe somehow reverses precedence in tree-sitter query.~ We should be able to use tree-sitter#1547 to test these queries if/when merged.

A sample output of tree-sitter tags path/to/mint/mix.exs with this Mint mix.exs:

Mint.MixProject | module    def (0, 10) - (0, 25) `defmodule Mint.MixProject do`
Mix.Project | call      ref (1, 6) - (1, 17) `use Mix.Project`
project     | function  def (6, 6) - (6, 13) `def project do`
Mix         | call      ref (11, 23) - (11, 26) `start_permanent: Mix.env() == :prod,`
env         | call      ref (11, 27) - (11, 30) `start_permanent: Mix.env() == :prod,`
elixirc_paths   | call      ref (12, 21) - (12, 34) `elixirc_paths: elixirc_paths(Mix.env()),`
Mix         | call      ref (12, 35) - (12, 38) `elixirc_paths: elixirc_paths(Mix.env()),`
env         | call      ref (12, 39) - (12, 42) `elixirc_paths: elixirc_paths(Mix.env()),`
deps        | call      ref (13, 12) - (13, 16) `deps: deps(),`
package     | call      ref (29, 15) - (29, 22) `package: package(),`
application | function  def (46, 6) - (46, 17) `def application do`
Mint.Application    | call      ref (49, 12) - (49, 28) `mod: {Mint.Application, []}`
package     | function  def (53, 7) - (53, 14) `defp package do`
elixirc_paths   | function  def (60, 7) - (60, 20) `defp elixirc_paths(:test), do: ["lib", "test/support"]`
elixirc_paths   | function  def (61, 7) - (61, 20) `defp elixirc_paths(_env), do: ["lib"]`
deps        | function  def (64, 7) - (64, 11) `defp deps do`

So the queries tag function/module definitions and usages of functions/modules. It's notable that modules are tagged as @reference.call when not being defmoduled: this lines up with the behavior of the ruby queries and allows you to jump to definitions of modules, it's not saying that those modules are function invocations. These queries aren't smart enough to figure out some language semantics like resolving aliased modules. For "precise navigation" which respects language semantics, we'd need to write some stack-graph construction queries (see this blog post), but stack-graphs are very new and it may be wise to wait until they're matured. Also the python queries aren't open-sourced afaict so I haven't really figured out the tricks yet 😅. Once I figure all that out, I'll try sending in some stack-graphs queries :)

@patrickt is this something I can trouble you with setting up if/once it's merged in (and maybe lend some review eyes)? I appreciate all your help with the integration stuff :)

the-mikedavis commented 2 years ago

oh whoops I didn't see #17, I'll try these queries against that branch and see if it changes anything

edit: looks good, these queries are producing the same results on both branches

josevalim commented 2 years ago

This is awesome @the-mikedavis! Quick question: do arities play a role in those queries? Should they?

If they do, then we need to consider foo |> bar() is calling bar() with one argument. If they don't, then we are good!

the-mikedavis commented 2 years ago

I believe arities are not considered, I think that wouldn't play well with some languages like javascript where arities can be flexible. It seems to me like it's very similar to ctags: the tokens on the left-hand side of the table (albeit parsed out by tree-sitter instead of ctags) become the search candidates.

I haven't dug far enough into stack-graphs and precise navigation to be sure, but that's probably something we'll have to keep in mind for writing those stack-graph queries :+1:

josevalim commented 2 years ago

@the-mikedavis perhaps one option is to include the arity in the token name:foo/3. But I am not sure if this is a good or bad idea.

the-mikedavis commented 2 years ago

Ah I like that idea. Afaik you don't have any control over modifying that token with how tree-sitter tags curently works, but that would make navigation a breeze if it did work, especially in big projects where you have a bunch of functions with the same names but different arities

jonatanklosko commented 2 years ago

Hey @the-mikedavis, fantastic!

the precedence rules seem to be backward between querying and determining tags

This is definitely a tree-sitter bug, it's easy to verify using the highlight queries.

# Running highlighting test passes
npx tree-sitter test --filter test/highlight

# Running the query tests with the corresponding queries doesn't work
npx tree-sitter query --test queries/highlights.scm test/highlight/**/*

cc @patrickt @maxbrunsfeld

Do you know how github/semantic fits into the picture? It seems necessary to add Elixir bindings there, but having another look I see that tag matching does the same as the corresponding tag query, so I'm wondering how these two relate.

the-mikedavis commented 2 years ago

Based on this documentation, I think semantic is not in the navigation picture, it seems to all be driven by tree-sitter and tags.scm, but we'll need a GitHubber to weigh in to be sure

jonatanklosko commented 2 years ago

Oh wow, maybe the "Used for code navigation on github.com" in github/semantic is no longer accurate then. I believe the GitHub docs were updated since I last checked too :D

Let's wait for some insights then :)

patrickt commented 2 years ago

@the-mikedavis @jonatanklosko Correct: semantic is no longer part of the tags pipeline (we switched to tree-sitter native tags support so as to avoid the overhead of pulling in a whole program analysis framework to do a simple analysis).

I’ll take this patch for a spin locally with our tagging backend. I can’t make guarantees about if or when it’ll land—we have to figure out, product/documentation-wise, how to pull in this functionality without making it part of our core supported languages. But I’m super excited about this work; thank you all!

patrickt commented 2 years ago

I can confirm that this patch works for our internal code-nav systems. Again, I can’t make promises as to when it will land—but this patch should be merged, in my opinion!

jonatanklosko commented 2 years ago

@patrickt fantastic news, thanks a lot! Fingers crossed for the integration to happen :D

@the-mikedavis awesome job! :shipit:

patrickt commented 2 years ago

Elixir code navigation is now active on GitHub!

jonatanklosko commented 2 years ago

@patrickt fantastic, thanks again for all the integration work!

@the-mikedavis beautiful job with the tags :cat:

elixir-lang / tree-sitter-elixir

add tags.scm queries #18