Open tpapp opened 2 years ago
@FelipeLema: since you wrote an indentation solution using tree-sitter, I would appreciate your comments. I recall you mentioning it a while ago.
I agree that currently maintaining the regex parser is taking a toll on code readability and maintanability, even considering people who are well versed in (Emacs) Lisp. I agree that tree-sitter is a tool that has many eyes on them ("it is production quality") and that end users have only good things to say about it.
However, I'd say it gets complicated for Julia. There's 2 big aspects to consider: how well the parser is maintained and how well it can be maintained (by the Julia community).
First, not only is the Julia tree-sitter parser not complete, but its pace may not fit the requirements of the Julia community (incl tooling people & Julia devs). It didn't fit me particularly, I ran into this problem rather quickly. When I started using the tree-sitter indent tool I found a bug. Then I reported it, debugged it, proposed a fix and I'm still waiting on the problem to be addressed. Can the tree-sitter parser be hijacked by the julia community so we can improve it at a quicker pace? I honestly don't know.
After that, I endend up writing a Julia code formatter because I thought it would easier to do so rather than pushing for changes upstream in the Julia tree-sitter parser. The name of this tool I did is actually a misnomer because it parses the code in a Julia buffer and does operations on its AST just as tree-sitter does.
So this brings up the second concern about tree-sitter: it is written in another language (and right now we need to write fixes in JS for the Julia parser). Correct me if I'm wrong, but I believe that most of the Julia community understands the long-term problems of maintaining several languages for their workflows. I would personally rather avoid writing JS if possible (which is what I ended up doing in the paragraph above).
My recommendation is to use CSTParser (or even the parser that comes with Julia binaries) to parse the AST and to copy code from tree-sitter.el to handle the items mentioned at the top entry of this discussion. From my experience with the Julia code formatter I'd recommend using DaemonMode for Emacs-Julia communication (to have little-to-no response delay) as using json-rpc, as LSP does, may bring problems when used in Windows.
Using CSTParser may have a positive effect as it lowers the barriers for Julia end users to participate in maintaining this package (kinda what the Racket community was betting on when they switched to ChezScheme).
All this being said, I want to note that I use tree-sitter in neovim on an everyday basis and that everything (except for the Julia tree-sitter parser) works A-OK.
@FelipeLema: thanks for the detailed explanation (incidentally, would you consider giving the PR you mentioned a friendly ping? Perhaps it was just overlooked --- that happens).
Conceptually, I can think of the following components for the features we need:
Currently in julia-mode we pretty much do everything above with hacks using regexps.
CSTParser.jl does 1+2. We would have to maintain part 3 ourselves, most likely using a daemon-based approach you outline. The advantage is indeed doing a lot in Julia. The disadvantage is all the framework associated with maintaining a running instance of Julia --- doable but somewhat heavyweight.
Tree-sitter would allow us to combine effort for 1, 2, and 3 with other projects (editors other than Emacs, languages other than Julia). That said, I realize that a lot of layers can introduce problems, too.
Interestingly, the LSP spec includes semantic tokens since 3.16. I wonder if that's supported in practice for Julia with Emacs, @gdkrmr and @non-Jedi, it would be great if you could share your thoughts about this. If we could make that work, it would cover pretty much everything for us.
Note that julia-snail already includes an interface to CSTParser.jl. It still relies on julia-mode for syntax highlighting and formatting, though.
Interestingly, the LSP spec includes semantic tokens since 3.16. I wonder if that's supported in practice for Julia with Emacs, @gdkrmr and @non-Jedi, it would be great if you could share your thoughts about this. If we could make that work, it would cover pretty much everything for us.
LanguageServer.jl doesn't support the semantictokens set of capabilities at this time unfortunately. I'm also really not sure if the architecture of the LSP would give you sufficient responsiveness for indentation and syntax highlighting. Any time you press enter, emacs would have to make a round-trip with the language server (plaintext over pipes) before deciding on indentation level.
worth noting: there's an active (very active? somewhat active?) support for parsing Julia in Scintilla
using Scintilla would have the benefit of an active support, but would have to integrate it to Emacs ourselves
I did not dig into the details, but I am still under the impression that tree-sitter would be the path of least resistance, because of:
I'm generally on board with integrating with tree-sitter. Even if support for julia syntax isn't perfect, it's probably better than what we have now especially wrt indentation.
We would need to update our lowest supported emacs version to 25 for dynamic module support.
That's fine with me, Emacs 25 has been released almost 6 years ago.
I don't really have any experience with syntax highlighting in Emacs, please correct me if I am wrong
lsp-julia
fails, julia-mode
is still very much usable (and users can switch to eglot
). Including CSTParser.jl
would probably mean that we have to put everything in a large monolith (why spin up two separate Julia processes if CSTParser and LanguageServer could run in a single process).Questions:
Is there anyway to enable tree-sitter in Julia mode? I installed, enabled, but it does not seem to have any effect.
can you paste or point to the code you're dealing with?
Nothing in particular. I am just trying to see if we can have a better performance. For example, scrolling this file is somewhat slow for me:
https://github.com/ronisbr/PrettyTables.jl/blob/master/src/backends/text/print.jl
emacs 29 will add native tree sitter support! Does anyone know how it works? Will there be an extra process or is it going to be a dynamic module? tree-sitter-julia
also seems to be reasonably active, not sure if it is ready yet (https://github.com/tree-sitter/tree-sitter-julia).
To use tree-sitter, you need to rewrite the major mode. I am doing some experiment in a julia-ts-mode
with very good outcome. There are rough edges, which I do not know yet if they are caused by emacs or julia grammar. I will publish this file tomorrow so people can test.
Here is the major mode:
https://github.com/ronisbr/julia-ts-mode
You need to add the file julia-ts-mode.el
to your path and add (require 'julia-ts-mode)
. Notice that you also need to install the Julia tree-sitter grammar.
I have to say that I am really amazed how easy it was to setup everything and the speed is definitely much faster than the current mode. Now, I need to work on navigation and imenu support.
This is awesome. Thanks @ronisbr. I'll need to compile emacs 29 for myself to try this. Maybe you could instead define julia-ts-mode
as a derived mode of julia-mode
and only override indent-line-function
and font-lock-defaults
(that way we don't lose the pieces of julia-mode not related to indentation and font-locking)?
@tpapp would we be willing to make future releases of julia-mode
only compatible with emacs 29+? This seems like a feature which would make doing so worthwhile.
Hi @non-Jedi !
This is awesome. Thanks @ronisbr. I'll need to compile emacs 29 for myself to try this. Maybe you could instead define julia-ts-mode as a derived mode of julia-mode and only override indent-line-function and font-lock-defaults (that way we don't lose the pieces of julia-mode not related to indentation and font-locking)?
The idea is to test the tree-sitter integration and then commit to this repository. I will not register julia-ts-mode
. Many major modes are defining something just as you said. Thus, the user can select if they want to old behavior or tree-sitter, if available. I think this is the best scenario.
@tpapp would we be willing to make future releases of julia-mode only compatible with emacs 29+? This seems like a feature which would make doing so worthwhile.
Yes, probably we will need to require Emacs 29 to make this integration works.
@non-Jedi: yes, definitely. @ronisbr: thanks for doing this. I believe that this is the best way to solve a long list of problems.
Perfect! I will ping this thread when I finish the initial version so that you can help me to integrate everything :)
Just an update:
I have been using Julia tree-sitter grammar for almost 3 weeks now. My doom configuration with this mode is here: https://github.com/ronisbr/doom.d/tree/emacs-29
Everything is working wonderfully! I found just two minor issues reported here:
https://github.com/tree-sitter/tree-sitter-julia/issues/88
https://github.com/tree-sitter/tree-sitter-julia/issues/73
The experience so far has been amazing.
eglot-jl currently explicitly lists julia-mode as a dependency, but it seems like it'd be straightforward to give julia-ts-mode a try. I'm not sure what I'd be looking for -- just see if I don't encounter problems, and maybe if it feels snappier?
I'm currently using the builtin c++-ts-mode
, and maybe it adds more features, but I haven't tried them yet.
mark-paragraph
still seems dumb / based on spacing rather than syntax.
function foo()
#mark begins
end
function bar()
# mark ends
end
Or maybe I need to look into documentation, and I'm supposed to replace functions like mark-paragraph
with tree-sitter-powered versions.
just see if I don't encounter problems, and maybe if it feels snappier?
Yes!
Or maybe I need to look into documentation, and I'm supposed to replace functions like mark-paragraph with tree-sitter-powered versions.
There is some support for navigation, but I did not change anything related to mark-paragraph
.
@chriselrod Looking at eglot-jl
again, I don't think julia-mode
is needed as a dependency. All that would be required to use it with julia-ts-mode
would be to modify eglot-server-programs
to instead include '(julia-ts-mode . eglot-jl--ls-invocation)
.
@ronisbr I haven't had a chance to build emacs 29 and test this, but would you mind skimming through the open issues when you get a chance and seeing which ones would be solved by your julia-ts-mode
implementation? I would like to test for at least #118, #111, #56, #12, #11 (especially #11!!) #3, #2. If you don't have time, that's understandable, and I'll take a look when I get a chance.
With this, eglot-jl should be compatible with both julia-mode and julia-ts-mode: https://github.com/non-Jedi/eglot-jl/pull/36
(I just swapped julia-mode
for (julia-mode julia-ts-mode)
).
Building emacs is fairly straightfoward. On fedora, you can install all the dependencies via:
sudo dnf install -y dnf-utils libgccjit-devel libtree-sitter-devel stow
sudo yum-builddeps emacs # install a million deps
Then to build
# cd somewhere/so/you/do/not/clutter
git clone git://git.savannah.gnu.org/emacs.git
cd emacs
./autogen.sh
mkdir build
cd build
CFLAGS="-O3 -march=native -fno-semantic-interposition" CXXFLAGS="-O3 -march=native -fno-semantic-interposition" ../configure --with-native-compilation --with-wide-int --with-json --with-tree-sitter
time make NATIVE_FULL_AOT=1 -j(nproc) # if using fish
# time make NATIVE_FULL_AOT=1 -j$(nproc) # if not using fish
sudo make install prefix=/usr/local/stow/emacs
cd /usr/local/stow
sudo stow emacs
stow
is nice so you can easily clear out something you've installed (sudo stow -D emacs
will remove all the symlinks it creates).
You may want to change flags/configuration options/etc. I obviously added native compilation and treesitter above. You may want pure-GTK if you're using Wayland, but I need to build with X because I'm using EXWM.
https://github.com/JuliaEditorSupport/julia-emacs/issues/118
julia-ts-mode only highlights quote
and end
.
https://github.com/JuliaEditorSupport/julia-emacs/issues/111
New line after quote
does not indent; typing without tab:
map(1:3) do x
x
end
f(map(1:3) do x
x
end)
mark and tab:
map(1:3) do x
x
end
f(map(1:3) do x
x
end)
https://github.com/JuliaEditorSupport/julia-emacs/issues/56
x .|>
f
It initially didn't have the indent, but as soon as I made another line, it auto-indented.
https://github.com/JuliaEditorSupport/julia-emacs/issues/12
module A
import Base: *
a = 1
b = *
c = 2
end
is what I get typing it out (no spurious indent).
Mark and tab preserves the correct relative indent, except it indents everything inside the module.
The module
keyword is not itself highlighted.
module A
import Base: *
a = 1
b = *
c = 2
end
https://github.com/JuliaEditorSupport/julia-emacs/issues/11 This behavior is customizable https://github.com/ronisbr/julia-ts-mode/blob/197e6e81a8d3d519df81fd21931a1e156ec1fc10/julia-ts-mode.el#L44-L84
Typing it out:
function1(a, b, c
d, e, f)
function2(
a, b, c
d, e, f)
for i in Float64[1, 2, 3, 4
5, 6, 7, 8]
end
for i in Float64[
1, 2, 3, 4
5, 6, 7, 8]
end
a = function3(function()
return 1
end)
a = function4(
function ()
return 1
end)
Neither leading 5
is highlighted, but all the other numbers are.
Mark and tab:
function1(a, b, c
d, e, f)
function2(
a, b, c
d, e, f)
for i in Float64[1, 2, 3, 4
5, 6, 7, 8]
end
for i in Float64[
1, 2, 3, 4
5, 6, 7, 8]
end
a = function3(function()
return 1
end)
a = function4(
function ()
return 1
end)
So, still some problems. -[ ] Leading numbers on new line not highlighted -[ ] Light on different faces/highlighting in general -[ ] Not very eager about indenting new lines, but seems pretty good in terms of consistent indentation levels when you ask for them.
I'd like in
in the loops to be highlighted, as well as of course the 5
s.
As for speed -- I'm not sure.
I had a file with a large literal matrix defined, and it seemed slower than I remember my experience being yesterday with julia-mode
.
https://github.com/JuliaEditorSupport/julia-emacs/issues/3
No special highlighting for user
println("hello $user")
https://github.com/JuliaEditorSupport/julia-emacs/issues/2 No highlighting for any of these variables.
Just to confirm, with this file:
https://gist.github.com/chriselrod/5d09f5156ee49f1d2822df1638093b76#file-highssimplex-jl
julia-mode
seems quite fast, while julia-ts-mode
lags for seconds or more while scrolling up and down.
julia-ts-mode
highlights the giant matrix, but julia-mode
does not.
Perhaps that is the cause of the performance difference?
I'll have to try more normal files.
Thankfully, we have 5k and 6k long otherwise more typical .jl
files at work to test on. =)
EDIT: it seems fast at navigating those.
Thanks for the amazing investigation @chriselrod !
https://github.com/JuliaEditorSupport/julia-emacs/issues/118 julia-ts-mode only highlights quote and end.
Ooops :D I forgot to add anything related with interpolation expressions. Now we have:
-[ ] Not very eager about indenting new lines, but seems pretty good in terms of consistent indentation levels when you ask for them.
I saw problems like this also in other languages. It seems some limitation on either the Emacs implementation or tree sitter itself. For example, when you type:
if a == 2
end
And press enter after 2
, the line is not indented. The reason is that there is no node inside the if statement. Hence, Emacs just does not know that we want to shift the indentation. I have no idea how to fix it. Perhaps https://github.com/tree-sitter/tree-sitter-julia/issues/73 will improve it.
Neither leading 5 is highlighted, but all the other numbers are.
This happens because it is a syntax error. We can highlight errors. However, the grammar is not 100% and some errors are false positives.
https://github.com/JuliaEditorSupport/julia-emacs/issues/3 No special highlighting for user
println("hello $user")
Done! I added the support for string interpolations. The font face is the same as in the constant, but bold.
(The underlines are LSP errors)
https://github.com/JuliaEditorSupport/julia-emacs/issues/2 No highlighting for any of these variables.
I think I did not fully understand what is the desired behavior. Can you please explain to me?
julia-ts-mode highlights the giant matrix, but julia-mode does not. Perhaps that is the cause of the performance difference? I'll have to try more normal files.
Yes! If you set trees it-font-lock-level
to 2, where the literals are not highlighted, it is much faster.
I'd like in in the loops to be highlighted, as well as of course the 5s.
The in
highlighting is done! The 5
in your example is impossible because tree-sitter will mention that there is an error.
By the way, I will add the error in the last font lock level together with the operators. Thus, the user can decide.
Now, if you set treesit-font-lock-level
to 4, you will see:
2
No highlighting for any of these variables.
I think I did not fully understand what is the desired behavior. Can you please explain to me?
Any time a variable is assigned to, the variable name should be highlighted with font-lock-variable-name-face
, e.g.:
x
in x = 5
a
and b
in let a = 1, b = 2
x
and y
in x, y = 4, 5
a
and b
in a = 5 + (b = 3)
a
in global a
b
in local b
i
in for i=1:10
(this one is arguable)x
and y
and the T
in the where
clause in function f(x::T, y) where T
(this one is arguable)But there are similar forms which should not be highlighted which makes this difficult to do without the full parser we get with tree-sitter, for example, calling a function with a keyword argument, named tuples, and setindex!
sugar (not sure if we should consider setproperty!
as variable assignment for this purpose...).
Hi @non-Jedi ,
Thanks!
Everything that is not arguable was implemented. However, we might have corner cases.
I add the variable highlighting to the level 3 (the default). This is the specification for each level:
Level 1 usually contains only comments and definitions. Level 2 usually adds keywords, strings, constants, types, etc. Level 3 usually represents a full-blown fontification, including assignment, constants, numbers, properties, etc. Level 4 adds everything else that can be fontified: delimiters, operators, brackets, all functions and variables, etc.
Now we have:
Thanks for all the great work! (Of course, thanks to everyone developing the packages I use.)
Yes! If you set
treesit-font-lock-level
to 2, where the literals are not highlighted, it is much faster.
For now I went in the other direction, and am trying 4
.
(defun treesit-font-lock-level-4 ()
(setq-local treesit-font-lock-level 4)
(treesit-font-lock-recompute-features))
(add-hook 'julia-ts-mode-hook #'treesit-font-lock-level-4)
Something else I noticed: macros aren't highlighted.
Thanks for all the great work!
You're welcome! I also want to point out the AMAZING work of Julia tree-sitter grammar developers (@maxbrunsfeld, @savq, and others). In all this time, I only found very minor issues! It is amazing!
For now I went in the other direction, and am trying 4.
Me too! However, it can slow down. I noticed that the problem is when at the screen there is a lot of highlighting. It seems that it can handle big files pretty well (did not test deeply).
Something else I noticed: macros aren't highlighted.
I did not understand, it seems to be working here:
By the way, until julia-ts-mode
is merged here, I needed to replicate some functionality. Hence, I copied all the code related with LaTeX symbol substitution. Now, I think it is working to the point I can start using it daily.
I did not understand, it seems to be working here:
Hmm -- I'm on a different computer with the same config as before, and I now see the same thing you showed. Perhaps I was on outdated versions. I'll let you know if I see anything different.
By the way, until julia-ts-mode is merged here, I needed to replicate some functionality. Hence, I copied all the code related with LaTeX symbol substitution. Now, I think it is working to the point I can start using it daily.
I undid this given the amazing advice of @non-Jedi to make Julia-ts-mode
a derived mode of Julia-mode
.
Update!
After a lot of problems, I managed to make an option to select which kind of indentation after assignment the user wants. Hence, we can now select:
var = a + b + c +
d + e +
f
or
var = a + b + c +
d + e +
f
tree-sitter
framework for incremental parsing of source code. The Julia implementation is now, in my opinion, fairly mature.I am asking for comments about replacing our ad-hoc regexp-based parsing mechanisms with it. Specifically, it would help with
resolving a host of issues.
I am aware that it isn't perfect, as nothing is, but at least improvements would go to a repository that helps all Julia users, not just those who use Emacs.
EDIT Some links: