JuliaEditorSupport / zed-julia

Julia support for Zed.
MIT License
36 stars 6 forks source link

Docstring highlighting for annotated definitions (`@kwdef`, `@cxxdereference`, etc.) #15

Closed steffenhaug closed 1 day ago

steffenhaug commented 1 week ago

I found another missed opportunity for applying docstring syntax highlighting and Markdown language injection. Currently, definitions with macro annotations, such as

"""
_Should_ have `markdown` applied!
"""
@cxxdereference function f()
end

and

"""
_Should_ have `markdown` applied!
"""
@kwdef struct T end

are not formatted as docstrings.

I'll make some queries for it and create a PR, but I'm making an issue to get some opinions on the following:

  1. Should we restrict the queries to a known set of well-known macros, such as @kwdef, or should we apply the highlighting to all strings before macro call expressions that contain struct- and function definitions?
  2. I think it might be valid to nest macro call expressions, like @foobar @boofar function f end, but I'm not sure if it is possible to make a recursive query that identifies this for arbitrary levels of nesting. Should we ignore this (probably rare) edge case, or should we hand-write up to a reasonable level of nested macro calls?
piechologist commented 1 week ago

I'm currently figuring out how to merge upstream/tree-sitter-julia. Their draft reference file added (macro_definition) to our well-known list:

((string_literal) @comment.doc
  .
  [
    (abstract_definition)
    (assignment)
    (const_statement)
    (function_definition)
    (macro_definition)      ; NEW
    (module_definition)
    (struct_definition)
  ])

We'll need to add that in highlights.scm and injections.scm. You can test it locally if you like, otherwise I'll come back to this issue in the next few days.

piechologist commented 1 week ago

Oh, your examples are macro calls so, the addition I mentioned won't probably work…

steffenhaug commented 1 week ago

Yea the addition of (macro_definition) doesn't fix this.

I'm guessing the Vim, Emacs, and VS Code plugin devs already have queries for most of these edge cases, so I'm all for not reinventing the wheel if that's possible. Although I guess a centralized set of queries hinges on the text editors naming the different highlights the same thing, and I don't know if that's the case.

piechologist commented 1 week ago

Most of our queries come straight from nvim-tree-sitter, I made some additions. Your examples here and your merged PR haven't be covered yet.

There's also helix which is mostly aligned with nvim. Not sure about emacs. VSC uses regex for highlighting with its own issues (I had used it for a few years because of the julia extension but I'm a happy Zed convert now).

piechologist commented 1 week ago

Although I guess a centralized set of queries hinges on the text editors naming the different highlights the same thing

There are only two capture names that we need to change. Nvim has a few more predicates for injections.

steffenhaug commented 1 week ago

I have hit a little bit of a roadblock here, because (of all things) the Julia tree-sitter grammar appears to parse macro calls incorrectly.

image

It is my understanding that only the @doc macro is handled as a special case in the parser to allow newlines between macro arguments, and that newlines are only allowed between the first and second argument. The advanced usage section of the documentation manual is pretty clear about this. Yet, it looks like the tree-sitter grammar allows newlines between arguments to any macro, and not even just between the first and second argument.

I can work around this, by adding a rule to highlight strings before macrocall expressions inside macrocall expressions, but that's quite gnarly. I guess I can implement it that way, document it as a workaround, and bug the tree-sitter-julia people about it.

steffenhaug commented 1 week ago

This behavior also breaks the queries in my last PR if there is a top-level macro before it.

image

piechologist commented 1 week ago

bug the tree-sitter-julia people about it

That's the sane way to solve the issue. I bet the more complex our queries become, the more new issues slip our testing. And queries can't match whitespace which makes the workaround more challenging. Then one day with a new grammar, we have to clean up the workaround mess and practically start from scratch. I know because I went through this when contributing to the fish extension 😅.

One more thing (documenting here so I don't forget about it): we haven't covered @doc raw"""...""" yet. That example is from the Advanced Usage you mentioned above and it's handy for docstrings containing LATEX or many $ signs. We'd need to match (prefixed_command_literal)s beginning with raw""", similar to the queries in injections.scm.

steffenhaug commented 1 week ago

bug the tree-sitter-julia people about it

That's the sane way to solve the issue.

Agreed. I'll file an issue if there isn't one already and reference ours.

I bet the more complex our queries become, the more new issues slip our testing. And queries can't match whitespace which makes the workaround more challenging. Then one day with a new grammar, we have to clean up the workaround mess and practically start from scratch. I know because I went through this when contributing to the fish extension 😅.

Yeah if we were to merge a workaround like this, I would hope we could delete it pretty quickly and not have crap like this hanging around for ages :-p In terms of cleaning up the mess, I don't really see the problem, because we just delete these queries when the grammar is fixed. To me, the question boils down to whether we want ugly code temporarily, or whether we want broken highlighting temporarily, and the answer might very well be that we would rather have broken highlighting. I'm not sure where we want to strike that balance.

As a general point I agree, that more complex queries are likely more sensitive to changes in the grammar, and more difficult to fix. I'm not sure how volatile the Julia grammar is? Maybe avoiding complex queries is warranted, but I think given how easy they are to write, I would personally lean more towards trying to mostly do correct highlighting, even if we need to do complex queries. I don't have as much experience with this as you probably do though, so I'm open to being wrong.

One more thing (documenting here so I don't forget about it): we haven't covered @doc raw"""...""" yet. That example is from the Advanced Usage you mentioned above and it's handy for docstrings containing LATEX or many $ signs. We'd need to match (prefixed_command_literal)s beginning with raw""", similar to the queries in injections.scm.

That sounds pretty easy to do, I can make an issue for it now and whip up some queries for it when I have time. Apropos LaTeX, I would also like to see language injection in LaTeXStrings, but I couldn't make this work as well as I wanted becasue the TeX parser isn't robust enough to handle the surrounding L""" and """. This would also require support in the grammar. I can make an issue for that as well.

piechologist commented 1 week ago

I'm not sure where we want to strike that balance.

I'm not opinionated about it, a temporary fix is fine. A few things though:

  1. I fear that we break other, seemingly unrelated queries. The queries are easy to write but difficult to test.
  2. I'd like to use the upstream/tree-sitter-julia reference highlights.scm. That's WIP and I don't know when it will be ready. I hope they'll cover the essential stuff and we append a few more queries that we want. In the future, we would try to make enhancements upstream and just copy their updated reference and patch it.
  3. As my time is limited, and yours certainly as well, we should be aware that our efforts may be superseded by the next updated grammar.
  4. My first Tree-sitter sighting was in January when I switched to Zed and I don't consider myself an expert. Since January, they made some great internal improvements to the grammar but that didn't break the queries. So, it's not very volatile.

I appreciate the time you spend on it and we can add more queries. Let's just test them thoroughly. We have that test-cases directory now and we can add more files to it.