elixir-lang / tree-sitter-elixir

Elixir grammar for tree-sitter
https://elixir-lang.org/tree-sitter-elixir
Apache License 2.0
245 stars 24 forks source link

Parsing Documentation Strings Line By Line #46

Open natdm opened 1 year ago

natdm commented 1 year ago

Forgive me if this is a neovim specific issue, though I don't think it is --

The comment parsing in elixir seems to be done by block instead of by line. This makes it harder to do parsing in comments that go line by line. I understand this is a functionality I'm using in neovim, but it could just as easily be used in another client, like a treesitter-js client.

Here's an example: Elixir: https://ibb.co/G7QK5Hr

Comments show up as a block of quoted_context. This makes it difficult to use treesitter to parse where things like the examples may be.

Here's an example with Rust using line-by-line comment parsing, and you can use TS to parse each one to see if there may be code in the comment: https://ibb.co/XtKFjFL

the-mikedavis commented 1 year ago

Parsing quoted_context by line would make it easier to inject these but you would be parsing markdown by regex which is a bit brittle. It would be more robust to inject markdown into the quoted_context node when it's a doc string. That wouldn't need any changes to tree-sitter-elixir and you would get all markdown highlighting in doc strings. (Also see https://github.com/tree-sitter/tree-sitter-rust/pull/128 which attempts to make this possible for Rust.)

This is how we do it in Helix using tree-sitter:

Screen Shot 2022-12-04

With this injection query (although neovim uses different queries for injections): https://github.com/helix-editor/helix/blob/59cfe95776238f047131e497124c97d69d838c2b/runtime/queries/elixir/injections.scm#L14-L21

You could then make an elixir-specific markdown and assume that codeblocks are elixir or check if they're iex and use tree-sitter-iex (these changes to Helix: https://github.com/the-mikedavis/helix/commit/204a2dadb84e921b29d91dc05289eba80fd3c302; I'm not sure how that would look in neovim):

Screen Shot 2022-12-04
jonatanklosko commented 1 year ago

Regardless of how docstrings are highlighted, at the end of the day they are just strings, not comments, and I don't think we should break them into multiple lines in the AST. In Rust it's natural that each line is separate in the AST, because each line is denoted as a docstring individually with ///.