emacs-ess / ESS

Emacs Speaks Statistics: ESS
https://ess.r-project.org/
GNU General Public License v3.0
620 stars 162 forks source link

mode for treesitter in 29.1? #1239

Open mguzmann opened 1 year ago

mguzmann commented 1 year ago

Currently there is no r-ts-mode or similar. Is there a plan to support the native treesitter?

Thanks!

lionel- commented 1 year ago

Yes vaguely. Since the mode will be started from scratch, it's a good opportunity to make ESS a minor mode instead of a major mode that is inherited from. So that's a rather large project altogether.

lionel- commented 1 year ago

WIP at https://github.com/emacs-ess/ESS/tree/tree-sitter

This work will move slowly but tree-sitter-r is still unstable and undergoing a major rewrite, so it's still too early to depend on it anyway.

milanglacier commented 1 year ago

For anyone who is interested in the progress with additional details:

The treesitter grammar of R (r-lib/tree-sitter-r) is undergoing a major refactoring in the branch next.

Neovim has been using the main branch of that grammar for a long time and it seems that it works okay on neovim's side.

nverno commented 6 months ago

I wrote an r-ts-mode to test out the grammar.

The most noticeable problem is missing grammar for else statements in the tree-sitter parser. Aside from that, the indentation and font-locking work pretty well.

lionel- commented 6 months ago

Our main issue is that we have a complex indenter that supports many styles. I think we could simplify that to 2 or 3 styles though (ESSR which is the default, and the RStudio style and its minus variant). Ideally we'd rewrite the indenter, perhaps in a simplified form, based on the tree-sitter tree.

Worth noting that TS nodes are immutable and reusable across different versions of the files, which implies they don't store parents, only children. Finding the parent requires a full tree traversal. So the current approach of inspecting parents to decide the indentation might not scale that well. We could either create a red tree on top of the TS tree (see green and red trees from Roslyn or Swift's libsyntax) or use another approach (Wadler IR without the line-splitting parts?) to solve that.

OR we just use the default Emacs TS indenter and rely on format-on-save with a third party tool to fix incorrect indentation...

Also as we make this transition to TS it would be nice to decouple the R mode from ESS. That would involve turning ESS into a minor mode instead of a major mode that's inherited from. The minor mode would manage the keybindings for interactions with the ESS REPL buffer.

nverno commented 5 months ago

I updated https://github.com/nverno/r-ts-mode/tree/master to use the latest grammar from the 'next' branch.

These simple indentation rules were good enough to be pretty close to RStudio-.

(defvar r-ts-mode--indent-rules
  `((r
     ((parent-is "program") column-0 0)
     ((node-is "}") standalone-parent 0)
     ((node-is ")") parent-bol 0)
     ((node-is "]") parent-bol 0)
     ((node-is "else") parent-bol 0)
     ((node-is "braced_expression") standalone-parent 0)
     ((parent-is "braced_expression") standalone-parent r-ts-mode-indent-offset)
     ((parent-is ,(rx bow (or "if" "while" "repeat" "for") eow))
      parent-bol r-ts-mode-indent-offset)
     ((parent-is "binary_operator") parent-bol r-ts-mode-indent-offset)
     ((parent-is "function_definition") parent-bol r-ts-mode-indent-offset)
     ((node-is "arguments") parent-bol r-ts-mode-indent-offset)
     ((parent-is "arguments") standalone-parent r-ts-mode-indent-offset)
     ((parent-is "string") no-indent)
     (no-node parent-bol 0)))
  "Tree-sitter indent rules for `r-ts-mode'.")

We could either create a red tree on top of the TS tree

I don't think efficiency will be a concern - the tree-sitter indentation should be more efficient - and the parent node is already available to treesit-simple-indent.

lionel- commented 5 months ago

Great!

I don't think efficiency will be a concern - the tree-sitter indentation should be more efficient - and the parent node is already available to treesit-simple-indent.

You're right. Note that your current rules will produce staircase indentation with right-associative operators. Doesn't really matter since these are rare but to fix that you could use an anchor that recursively look for parents, which will cause repeated searches through the tree. That said, now that I think of it TS is probably using the node extents to do something like a binary search (the tree is ordered by position in the code). Also the AST of source code can be wide in extreme cases but is typically shallow.

If you have time can you also make rules to approach the RRR style? It's the default in ESS and used in the R core sources. Probably worth looking at RStudio too since it's the default style in that IDE (even though RStudio- is the one that conforms to the tidyverse style guide).

I would be in favour of integrating these TS rules even if they don't fully reproduce the current indentation behaviour.

lionel- commented 5 months ago

Relevant thread regarding performance of TS parents and deep trees: https://github.com/neovim/neovim/issues/24965 (with suggested fix)

nverno commented 5 months ago

For normal code, I've found r-ts-mode to be roughly 10x faster for indentation. I just compared the ess-r-mode to r-ts-mode on the deep else-if example from that thread and ess-r-mode roughly 3x faster (c-ts-mode is quite a bit slower than R in this case).

nverno commented 5 months ago

It seems like the tree-sitter-r parser could flatten the else if consequence branches into a single level - but my R knowledge is rusty.