Docstring indentation is too aggressive when semantic indentation is enabled.

clojure-emacs / clojure-ts-mode

The next generation Clojure major mode for Emacs, powered by TreeSitter

GNU General Public License v3.0

140 stars 13 forks source link

Docstring indentation is too aggressive when semantic indentation is enabled. #18

Open dannyfreeman opened 1 year ago

dannyfreeman commented 1 year ago

See commit 177ac05434fa2aed037be05707ca57f63df59ef5

Bascically, as you type and hit treesit indent functionality tries to unindent everything back to 2 spaces, so it's hard to add nicer indentation inside docstrings.

(defn foo 
  "My docstring
  I like to be indented like this
    If I come out to hear, treesit indent stuff pulls me back
  to this position"
  ...)

sogaiu commented 1 year ago

I tried out your example and I think I see what you're getting at.

A random idea is to use another grammar for text or docstrings - somewhat like what was discussed here.

No idea if there is a "plain text" grammar nor whether performance could be an issue though (^^;

Not sure if the following will be at all useful, but FWIW...

I've wondered about how to handle things in docstrings because sometimes there are snippets of code which inevitably one may feel would be nice to not have to manipulate as plain text.

Sort of related to that, in janet-ts-mode, to get rainbow-delimiters to work well, I ended up tweaking syntax properties of strings (there are two types in janet) like this. Those kinds of tweaks cause a variety of behaviors in Emacs to be different within strings.

I think I learned how to do this from ruby-ts-mode.el. These lines seem relevant.

dannyfreeman commented 1 year ago

I tried out your example and I think I see what you're getting at.

A random idea is to use another grammar for text or docstrings - somewhat like what was discussed here.

No idea if there is a "plain text" grammar nor whether performance could be an issue though (^^;

A markdown grammar would probably be the most appropriate. Some lsp clients will parse docstrings as markdown. While markdown is not standard, it could at least be used for things like `code snippets`

Sort of related to that, in janet-ts-mode, to get rainbow-delimiters to work well, I ended up tweaking syntax properties of strings (there are two types in janet) like this. Those kinds of tweaks cause a variety of behaviors in Emacs to be different within strings.

I think I learned how to do this from ruby-ts-mode.el. These lines seem relevant.

I thought those things were exclusively for font-locking, but I'm honestly not quite sure what I'm even looking at without spending some time studying that code. I will give it a more in depth look as I find the time.

sogaiu commented 1 year ago

I thought those things were exclusively for font-locking,

It's true that font-locking can be affected, however:

A syntax table specifies the syntactic role of each character in a buffer. It can be used to determine where words, symbols, and other syntactic constructs begin and end. This information is used by many Emacs facilities, including Font Lock mode (see Font Lock Mode) and the various complex movement commands (see Motion).

via: https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-Tables.html

So these lines (specifically the put-text-property call):

           ;; tweaking syntax class of some chars for better behavior
           (let ((pos (1+ bt1-start))
                 (stop (1- n-end)))
             (while (< pos stop)
               (when (not (memq (char-after pos)
                                ;; XXX: might need to tweak this further
                                (list ?\s ?\n ?\r ?\t ?\f ?\v
                                      ?\( ?\) ?\[ ?\] ?{ ?})))
                 (put-text-property pos (1+ pos)
                                    'syntax-table
                                    (string-to-syntax "w")))

fiddle with stuff within strings in such a way that a variety of other commands (including motion ones) behave differently than otherwise.

but I'm honestly not quite sure what I'm even looking at without spending some time studying that code. I will give it a more in depth look as I find the time.

Yeah, sorry for the lack of explanation -- it took me quite some time to wrap my head around this stuff (well, to the degree that I succeeded anyway).

Anyway, may be this approach isn't needed.

dannyfreeman commented 1 year ago

Anyway, may be this approach isn't needed.

It may be. The first path I'm going to pursue is using a nested markdown grammar. I have the initial workings here https://github.com/clojure-emacs/clojure-ts-mode/tree/bug/18/docstring-formatting

That solves another outstanding todo I had for font-locking in docstrings. I might actually push that to main because it is working quite nicely. I haven't touched indentation with it yet.

sogaiu commented 1 year ago

I gave that a try:

clojure-ts-mode-with-markdown-grammar-for-docstrings

The auto-installation of the grammar seemed to work fine :)

dannyfreeman commented 1 year ago

The auto-installation of the grammar seemed to work fine :)

The auto-installation stuff makes me nervous. I'm glad it's been working so far.

I've merged the nested markdown grammar to main. I'm leaving my bug branch for this commit open though. I still need to tackle indentation with it. Haven't worked with nested parser indentation yet.

sogaiu commented 1 year ago

The auto-installation stuff makes me nervous.

Yeah, that along with depending on multiple grammars that might change...perhaps motivation to consider alternative approaches?

Regarding change in grammars, I wonder about the semver-ish idea mentioned here:

point updates are grammar fixes that don't affect queries at all;

minor updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;

major updates remove/change grammar features (nodes) that break current queries.

I wonder how much of the change in tree-sitter-clojure could be described in terms of 1 and 2. My impression (quite possibly mistaken) is that most of the change was somewhat in the neighborhood of 3. May be it depends a lot on "what stage" a grammar's development is at.

IIUC, the author of the idea has extensive experience looking after the integration of a very large number of grammars in the context of an editor so it seems reasonable to suppose their observations informed the 3-point idea above.

dannyfreeman commented 1 year ago

sogaiu @.***> writes:

Yeah, that along with depending on multiple grammars that might change...perhaps motivation to consider alternative approaches?

There is some discussion in the mailing list now on how to handle this, that's actually what prompted me to open the issue you linked below. I'm not quite sure what the solution is other than to always install the exact grammar version we want in a unique location, and always load that. I'm not so sure people will like that though. Personally I enjoy having all the grammars pre-installed from my OS packages.

Regarding change in grammars, I wonder about the semver-ish idea mentioned here:

point updates are grammar fixes that don't affect queries at all;

minor updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;

major updates remove/change grammar features (nodes) that break current queries.

I wonder how much of the change in tree-sitter-clojure could be described in terms of 1 and 2. My impression (quite possibly mistaken) is that most of the change was somewhat in the neighborhood of

May be it depends a lot on "what stage" a grammar's development is at.

In my experience it is very hard to make a significant change that falls into categories 1 and 2. Even adding nodes can be breaking in their own way. They may not break queries (I think) but they can break things like "node under point" and "parent of current node". That is kind of something we saw with the transpose-sexp issue #17 that was opened earlier. While that was caused by the transpose function changes, I think it's easy to see how adding a new node could cause the it to break.

IIUC, the author has extensive experience looking after the integration of a very large number of grammars in the context of an editor so it seems reasonable to suppose their observations informed the idea stated above.

Of course, I would defer to someone with more experience with tree-sitter grammars in this regard. My own is fairly limited. I hope that issue results in something a little more concrete than "every grammar should use git tags with semvar".

-- Danny Freeman

sogaiu commented 1 year ago

There is some discussion in the mailing list now on how to handle this, that's actually what prompted me to open the issue you linked below.

I think I read some of it [1] -- thanks for mentioning it.

Personally I enjoy having all the grammars pre-installed from my OS packages.

I like relying on my OS packages for things that don't change much, but for things that end up as dependencies of things I'm working on, I find using OS packages to sometimes be a hindrance. (That's when not using NixOS or Guix though.)

As tree-sitter itself hasn't reached a 1.0 status, as I understand it, there is some feeling among the maintainers that certain things can be changed. Since the grammars depend on tree-sitter in a way, I view them as more in flux.

So yeah, if I were not doing tree-sitter-related work, surely I'd prefer to not be fiddling with different versions of grammars, but I wonder whether at this time how stable of an experience one can expect. Perhaps this also depends on the language in question.

When I use Neovim, I notice rebuilding of grammars from time to time and I do occasionally run into issues where I need to intervene manually. I have not found the error messages to be enough on their own to figure out what to do, but often enough someone else has already asked so a relevant query can lead to a solution (^^; Not great for sure, but this may have something to do with my lack of experience and skill with things vim-ish...

In my experience it is very hard to make a significant change that falls into categories 1 and 2. Even adding nodes can be breaking in their own way. They may not break queries (I think) but they can break things like "node under point" and "parent of current node". That is kind of something we saw with the transpose-sexp issue #17 that was opened earlier. While that was caused by the transpose function changes, I think it's easy to see how adding a new node could cause the it to break.

I wonder if there are any issues / PRs at nvim-treesitter that demonstrate the feasibility of the 3-point idea...My imagination is not succeeding in seeing the idea working well.

[1] Thread subject -> "Re: Update on tree-sitter structure navigation"

dannyfreeman commented 1 year ago

sogaiu @.***> writes:

As tree-sitter itself hasn't reached a 1.0 status, as I understand it, there is some feeling among the maintainers that certain things can be changed. Since the grammars depend on tree-sitter in a way, I view them as more in flux.

So yeah, if I were not doing tree-sitter-related work, surely I'd prefer to not be fiddling with different versions of grammars, but I wonder whether at this time how stable of an experience one can expect. Perhaps this also depends on the language in question.

I don't think people in our positions, writing grammars and tooling based on them, can expect much stability. Users may not have the same expectations though. I think with Emacs at least there is a general understanding that the various *-ts-modes are still in an "experimental" stage.

When I use Neovim, I notice rebuilding of grammars from time to time and I do occasionally run into issues where I need to intervene manually. I have not found the error messages to be enough on their own to figure out what to do, but often enough someone else has already asked so a relevant query can lead to a solution (^^; Not great for sure, but this may have something to do with my lack of experience and skill with things vim-ish...

I think it's going to be pretty general advice when people report issues, to say "make sure you can reproduce your issue with this version of the grammar first". In fact, that is problem something to include in the repo's github issue template.

I also have this new ticket open https://github.com/clojure-emacs/clojure-ts-mode/issues/19 to write some kind of interactive command to help diagnose grammar issues.

In my experience it is very hard to make a significant change that falls into categories 1 and 2. Even adding nodes can be breaking in their own way. They may not break queries (I think) but they can break things like "node under point" and "parent of current node". That is kind of something we saw with the transpose-sexp issue #17 that was opened earlier. While that was caused by the transpose function changes, I think it's easy to see how adding a new node could cause the it to break.

I wonder if there are any issues / PRs at nvim-treesitter that demonstrate the feasibility of the 3-point idea...My imagination is not succeeding in seeing the idea working well.

Demonstrating the feasibility of the semantic versioning guidelines? I think my imagination is at a loss too. What might we expect to see? A grammar successfully releasing a 0.0.X patch update and not breaking anything? Then a 0.X.0 minor version update, adding nodes, and everything keeps working? I'm afraid I'm at a loss as well.

-- Danny Freeman

sogaiu commented 1 year ago

Demonstrating the feasibility of the semantic versioning guidelines? I think my imagination is at a loss too. What might we expect to see? A grammar successfully releasing a 0.0.X patch update and not breaking anything? Then a 0.X.0 minor version update, adding nodes, and everything keeps working?

Something along those lines would be neat evidence, but I was thinking more along the lines of just seeing that changes in particular grammars that happened to fall into categories 1 and 2 (without the explicit version labelling).

I have been wondering how often 1 or 2 has happened (slightly edited):

updates are grammar fixes that don't affect queries at all;

updates add grammar features (nodes) that need query updates to make use of, but queries work fine with otherwise;

I suppose without some effort to arrange for those kinds of changes they are not likely to occur by chance. Was curious whether there was already some kind of "track record".

I have my doubts about any scheme like this for at least the following reasons:

Even if one wanted to follow such a scheme, it seems like it would be extra work to do so and quite possibly difficult / complicated to get right
I don't know what sensible incentive there would be for maintainers to put forth the extra effort
Limited resources suggest this might not be a high priority

I found just getting the grammars I've worked on [1] to behave appropriately to be challenging enough...

[1] ...and these were probably some of the simplest grammars even.

sogaiu commented 1 year ago

I think it's going to be pretty general advice when people report issues, to say "make sure you can reproduce your issue with this version of the grammar first".

Sure. What was tricky about Neovim was that I was seeing error messages in contexts where I wasn't expecting to see them at all. I would start Neovim and then suddenly see errors which looked vaguely like they would have something to do with tree-sitter but I wasn't sure (again, that may have a lot to do with my relative vim-ish ignorance).

I haven't hit anything like that with Emacs + tree-sitter yet, but then my ignorance seems to be much less here.

In fact, that is problem something to include in the repo's github issue template.

Sounds like a plan :)

I also have this new ticket open #19 to write some kind of interactive command to help diagnose grammar issues.

Do you know if any other *-ts-modes are doing anything similar?

dannyfreeman commented 1 year ago

sogaiu @.***> writes:

Sure. What was tricky about Neovim was that I was seeing error messages in contexts where I wasn't expecting to see them at all. I would start Neovim and then suddenly see errors which looked vaguely like they would have something to do with tree-sitter but I wasn't sure (again, that may have a lot to do with my relative vim-ish ignorance).

I haven't hit anything like that with Emacs + tree-sitter yet, but then my ignorance seems to be much less here.

I have with other grammars (javascript for sure). Most of the time indentation just doesn't work, or the entire buffer does not get font-locked (because a query is now invalid).

I also have this new ticket open #19 to write some kind of interactive command to help diagnose grammar issues.

Do you know if any other *-ts-modes are doing anything similar?

I have not seen anything like it yet. I've also not seen any bug reports about grammar issues yet so it's not super high priority right now. As far as I know, any distros packaging tree-sitter-clojure currently use a compatible version with clojure-ts-mode. Great for now, but we will probably end up pushing a breaking change to the grammar at some point. When that happens and the distro grammars inevitably lag behind I think compatibility issues are likely to occur. I think the "clojure-ts-grammar-doctor" will come in handy for users.

-- Danny Freeman

sogaiu commented 1 year ago

I have with other grammars (javascript for sure). Most of the time indentation just doesn't work, or the entire buffer does not get font-locked (because a query is now invalid).

Oh I meant that in Emacs it was easier to investigate and potentially fix things (as compared with Neovim) -- possibly because of the existence of edebug but also it seems much easier to look up the source for various functions.

Once I see an error message in Neovim, I might get an idea of what file and line to look at, but from that point, I'm in code I have no idea about with no practically economic and reliable way of observing execution like I can with edebug. Anyway, this is perhaps kind of tangential. Sorry about that.

but we will probably end up pushing a breaking change to the grammar at some point. When that happens and the distro grammars inevitably lag behind I think compatibility issues are likely to occur. I think the"clojure-ts-grammar-doctor" will come in handy for users.

Yeah, it might be a good way to collect some relevant info for reproducing an issue too? May be that's along the lines of what you already stated here:

If the doctor does find issues, perhaps a special buffer clojure-ts-doctor can be shown with details about the issues found.