Open mguzmann opened 1 year ago
Yes vaguely. Since the mode will be started from scratch, it's a good opportunity to make ESS a minor mode instead of a major mode that is inherited from. So that's a rather large project altogether.
WIP at https://github.com/emacs-ess/ESS/tree/tree-sitter
This work will move slowly but tree-sitter-r is still unstable and undergoing a major rewrite, so it's still too early to depend on it anyway.
For anyone who is interested in the progress with additional details:
The treesitter grammar of R (r-lib/tree-sitter-r
) is undergoing a major refactoring in the branch next
.
Neovim has been using the main
branch of that grammar for a long time and it seems that it works okay on neovim's side.
I wrote an r-ts-mode to test out the grammar.
The most noticeable problem is missing grammar for else statements in the tree-sitter parser. Aside from that, the indentation and font-locking work pretty well.
Our main issue is that we have a complex indenter that supports many styles. I think we could simplify that to 2 or 3 styles though (ESSR which is the default, and the RStudio style and its minus variant). Ideally we'd rewrite the indenter, perhaps in a simplified form, based on the tree-sitter tree.
Worth noting that TS nodes are immutable and reusable across different versions of the files, which implies they don't store parents, only children. Finding the parent requires a full tree traversal. So the current approach of inspecting parents to decide the indentation might not scale that well. We could either create a red tree on top of the TS tree (see green and red trees from Roslyn or Swift's libsyntax) or use another approach (Wadler IR without the line-splitting parts?) to solve that.
OR we just use the default Emacs TS indenter and rely on format-on-save with a third party tool to fix incorrect indentation...
Also as we make this transition to TS it would be nice to decouple the R mode from ESS. That would involve turning ESS into a minor mode instead of a major mode that's inherited from. The minor mode would manage the keybindings for interactions with the ESS REPL buffer.
I updated https://github.com/nverno/r-ts-mode/tree/master to use the latest grammar from the 'next' branch.
These simple indentation rules were good enough to be pretty close to RStudio-
.
(defvar r-ts-mode--indent-rules
`((r
((parent-is "program") column-0 0)
((node-is "}") standalone-parent 0)
((node-is ")") parent-bol 0)
((node-is "]") parent-bol 0)
((node-is "else") parent-bol 0)
((node-is "braced_expression") standalone-parent 0)
((parent-is "braced_expression") standalone-parent r-ts-mode-indent-offset)
((parent-is ,(rx bow (or "if" "while" "repeat" "for") eow))
parent-bol r-ts-mode-indent-offset)
((parent-is "binary_operator") parent-bol r-ts-mode-indent-offset)
((parent-is "function_definition") parent-bol r-ts-mode-indent-offset)
((node-is "arguments") parent-bol r-ts-mode-indent-offset)
((parent-is "arguments") standalone-parent r-ts-mode-indent-offset)
((parent-is "string") no-indent)
(no-node parent-bol 0)))
"Tree-sitter indent rules for `r-ts-mode'.")
We could either create a red tree on top of the TS tree
I don't think efficiency will be a concern - the tree-sitter indentation should be more efficient - and
the parent node is already available to treesit-simple-indent
.
Great!
I don't think efficiency will be a concern - the tree-sitter indentation should be more efficient - and the parent node is already available to treesit-simple-indent.
You're right. Note that your current rules will produce staircase indentation with right-associative operators. Doesn't really matter since these are rare but to fix that you could use an anchor that recursively look for parents, which will cause repeated searches through the tree. That said, now that I think of it TS is probably using the node extents to do something like a binary search (the tree is ordered by position in the code). Also the AST of source code can be wide in extreme cases but is typically shallow.
If you have time can you also make rules to approach the RRR
style? It's the default in ESS and used in the R core sources. Probably worth looking at RStudio
too since it's the default style in that IDE (even though RStudio-
is the one that conforms to the tidyverse style guide).
I would be in favour of integrating these TS rules even if they don't fully reproduce the current indentation behaviour.
Relevant thread regarding performance of TS parents and deep trees: https://github.com/neovim/neovim/issues/24965 (with suggested fix)
For normal code, I've found r-ts-mode
to be roughly 10x faster for indentation.
I just compared the ess-r-mode
to r-ts-mode
on the deep else-if example from that
thread and ess-r-mode
roughly 3x faster (c-ts-mode
is quite a bit slower than R in this case).
It seems like the tree-sitter-r parser could flatten the else if
consequence branches
into a single level - but my R knowledge is rusty.
Currently there is no r-ts-mode or similar. Is there a plan to support the native treesitter?
Thanks!