Rainbow blocks fix for Lua (+ other languages with a little work)

Danielkonge commented 12 months ago

This is a draft of a fix to the problem where rainbow-delimiters doesn't catch the same group multiple times. E.g., it couldn't catch elseif and else in Lua before.

I have time to edit and add improvements if you have any suggestions, or we could add it as another strategy if you don't want it to replace the old global?

I have only tested it in Lua and Rust for now (and before getting rainbow blocks to work better than before in other languages we need to edit the corresponding .scm files). But it works exactly as I would expect so far.

To use this it is important that each @container only has at most one @closing, but you can have as many @opening or @intermediate as you want.

HiPhish commented 12 months ago

Thank you for wanting to help out. Can you please explain how you worked around the issue of matching the same capture multiple times? My understanding is that if you have a query like (foo (bar) @xxx (baz) @xxx) Neovim's Tree-sitter API cannot handle the two @xxx captures and will only give me one. If I recall correctly that is why I have separate @opening, @intermediate and @closing captures.

Danielkonge commented 12 months ago

The problem is not really Neovim from what I can tell. If you play around with the treesitter playground (https://github.com/nvim-treesitter/playground) all the queries captures multiple captures fine.

(Look at the query and lines 13 to 21.)

The problem is really with iter_matches (which is the obvious choice for this algorithm). (This is the problem discussed here: https://github.com/neovim/neovim/pull/17099 )

So I tried rewriting basically the same algorithm but using iter_captures instead of iter_matches, since iter_captures does have the behavior I would expect (unlike iter_matches).

Unlike iter_matches, which captures the @container as a whole and then only can give you one of each capture inside the @container (e.g. only one @intermediate above), iter_captures gets all the captures in the following order:

@container first and then the order the other captures show up in in the code. If there is another @container inside the first @container, then we start going through that. I.e., we go through all captures in the order they show up in the code, but each "large" capture starts with @container whereafter we go through the captures inside the @container.

The algorithm tries to mirror the previous algorithm (depth first search).

(1) We make an empty matches stack and an empty match_record and start going through all the captured nodes based on our queries (on the changed part as before).

(2) If the captures name is @container that means we are starting going through a new container.

If match_record.container is not yet set, then that means that we are not currently inside another @container, so we set match_record.container = node.
If match_record.container is set, then that means that our current node is inside match_record.container and we starting on a new container. So we (temporarily) push match_record to matches so we can pop it back out later when we are done going through the current node. And then we start a new match_record with container equal to the current node.

(3) If the node's capture name is @closing that means that we are done with the current container (this is a new assumption I added!). So we set match_record.closing to be the current node and pop out the previous match_record from matches (if there is one).

- If the previous match_record exists and is an ancestor of the current match_record, then we add the current match_record to the children of the previous match_record, which we are still inside (the one we temporarily pushed earlier). And we continue working with the previous match_record which is now the new current match_record since we are still working inside it. [NOTE: SEE COMMENT FURTHER DOWN]
- Otherwise, if the previous match_record exists but is not an ancestor of the current match_record, then we are done with the current match_record (since we saw its @closing), so after popping out the previous match_record, we push both the current match_record and the previous match_record to matches (since it was not an ancestor, we are out of it too). [The order we push these doesn't matter.] Finally we create a new empty match_record in this case.
If there is no previous match_record, we cannot be inside another @container, so we just push the current match_record to matches and create a new empty match_record.

(4) If the node's capture name is not @container or @closing, then we add to the corresponding field in match_record (@opening or @intermediate) if such a field exists. Since we haven't seen a @closing in the current @container yet, this will be assigned correctly simply by the order iter_captures goes through the captures.

All of this is really just trying to implement the old algorithm with the restrictions we get from using iter_captures instead of iter_matches.

If we want this to replace the current global (and maybe local?) strategy, I think it would be a good idea to actually still keep the old code around and just comment it out (this pull request doesn't do this right now). If they ever fix iter_matches so it has the expected behavior upstream, I think the current implementation is nicer than what you see in this pull request. [It doesn't add the extra assumption that all @containers end with a single @closing.]

Another solution could be to keep a better version of iter_matches as part of rainbow-delimiters if you prefer that instead, or just adapt this as an alternative strategy (e.g. global_alt).

Danielkonge commented 12 months ago

This might be an even better example showing that Neovim can catch all the test in the pattern although they all have the same name here.

(Line 13-21 again. There are 3 test on the first line because the one caught for the whole "container" is listed there.)

HiPhish commented 12 months ago

Ah yes, the cursed issue where everyone who takes it over vanishes. I am familiar with iter_matches, but I did not consider using iter_captures.

I will have to go over your algorithm in detail when I have the time one of these days. But I don't like that @closing has to be a single node, this just kicks the can further down the road. How about this:

there are only two captures @container and @delimiter
As you have stated a new @capture starts a new context
For every @delimiter node check whether it is a child of the current topmost @container on the stack
- If it is then keep going
- Otherwise we are done with the current @container

This avoids the need for a @closing node altogether, but in return we have to perform this check for every single node. I don't know if this would make much of a performance impact or if it is just a matter of comparing a few numbers. What do you think of my proposal?

If we want this to replace the current global (and maybe local?) strategy, I think it would be a good idea to actually still keep the old code around and just comment it out (this pull request doesn't do this right now).

They would not work if we rename the captures in the queries. We could keep the old names and just treat them the same in the new strategies. But honestly I don't see much of a point in keeping the old strategies around. The algorithm is limited to one function and described in the HACKING file if we need it for posterity.

On a side note, how hard did you find it to get into the code? I rarely get non-trivial contributions, so I have no idea how readable and approachable my code really is.

Danielkonge commented 12 months ago

First a note to the algorithm I wrote above: You can actually avoid doing the check for whether something is an ancestor (where I wrote NOTE above) if you assume that there is only one @closing, since you can just check instead if the previous match record has already been closed or not. I don't know if this will actually speed up things significantly or not, but I thought I would note it for now.

I will have to go over your algorithm in detail when I have the time one of these days. But I don't like that @closing has to be a single node, this just kicks the can further down the road. How about this:

there are only two captures @container and @delimiter

As you have stated a new @capture starts a new context

For every @delimiter node check whether it is a child of the current topmost @container on the stack

If it is then keep going

Otherwise we are done with the current @container

This avoids the need for a @closing node altogether, but in return we have to perform this check for every single node. I don't know if this would make much of a performance impact or if it is just a matter of comparing a few numbers. What do you think of my proposal?

Your suggestion is definitely possible and it shouldn't be that hard to change what I currently have. I can try to have code for both algorithms written so we can test them a bit more (this sounds like something where we might want to actually benchmark the performance of these things, since I also don't really know the implications of checking if something is a parent of a treesitter node or not). [I think the parent check should be quick though.]

If we want this to replace the current global (and maybe local?) strategy, I think it would be a good idea to actually still keep the old code around and just comment it out (this pull request doesn't do this right now).

They would not work if we rename the captures in the queries. We could keep the old names and just treat them the same in the new strategies. But honestly I don't see much of a point in keeping the old strategies around. The algorithm is limited to one function and described in the HACKING file if we need it for posterity.

I will start out by keeping the old capture names around for now, just because we would need to edit all the queries otherwise, and it will also make it easier to compare the two implementations. If we decide on changing the names to @delimiter this will be easy to change later.

On a side note, how hard did you find it to get into the code? I rarely get non-trivial contributions, so I have no idea how readable and approachable my code really is.

I think it was pretty readable, but I mostly looked at the queries and strategies (+ a little at the general implementation). I am pretty new to Lua, so some of the meta tables stuff was a bit confusing, but that probably has more to do with me not being used to Lua meta tables yet than your actual code. Also I have never used Vader, so for testing I have just been manually checking whether it works as expected in my day to day programming + a few of the test files available.

To be a bit more precise, I think your Stack implementation and the strategy code is very easy to follow, but I was a bit confused about how the namespaces worked in lib.lua. After looking at it more, I guess part of it is because vim.api.nvim_create_namespace() doesn't behave as I would have liked. (I would have liked it to be easier to have per-language namespaces that I could search for when looking at Extmarks.)

Danielkonge commented 12 months ago

I have added an alternative update_range function (currently called update_range2) that is easy to change so it works with just @delimiter and @container (all that is needed is changing the names of capture groups in new_match_record and highlight_matches). I have only tested functionality and not performance so far, but it seems to work fine and I can't tell any difference in performance on my computer. Though we should probably do some actual performance tests comparing update_range1 with update_range2.

Danielkonge commented 12 months ago

I have a large local Rust file (5.5MB) that I tried to use to compare the two algorithms.

On my computer with this file, both algorithms seem to highlight ok, and the main part of update_range1 takes about 0.6secs on average while the main part of update_range2 takes about 1.2secs. This doesn't seem like a huge difference, but update_range1 was consistently about twice as fast as update_range2.

Also, it should be noted that this file has almost no nested highlights levels (it is basically all level 2 in one long list), so ts.is_ancestor only needs to do very little work.

In general, it seems that keeping track of @closing tags and ensuring that there is exactly one @closing for every @container will be faster, but for more common size files there is no difference. (On smaller normal size files they both take about 0.01-0.05secs for me.)

Also note that this was timed with os.clock() in Lua, which I can read is not the best for real world timing of algorithms.

HiPhish commented 12 months ago

There is also a C++ file under test/stress/markdown/lorem-ipsum.md which you can try. It has a number of languages embedded in Markdown, but it might be too short to deliver any meaningful results.

If we want to go with the @closing capture then how about we treat it purely as a sentinel node? By that I mean the content which we want to highlight is captured with @delimiter, but in addition to that we also capture the last delimiter with @closing. Here is what it would look like for Lua function definitions where the sentinel is also the last delimiter:

(function_declaration
  "function" @delimiter
  "end" @delimiter @closing) @container

Here is a contrived HTML example in which for whatever reason I only want to highlight the name of the closing tag, so the delimiter is tag_name and the sentinel is end_tag:

(element
  (start_tag
    "<" @delimiter
    (tag_name) @delimiter
    ">" @delimiter)
  (end_tag
    (tag_name) @delimiter) @closing) @container

This will require adjusting queries, but the change is straight-forward. I can do that myself, no need to waste your time.

Danielkonge commented 12 months ago

There is also a C++ file under test/stress/markdown/lorem-ipsum.md which you can try. It has a number of languages embedded in Markdown, but it might be too short to deliver any meaningful results.

Both algorithms take 0.00secs in my timing, so it is definitely too short to see anything. (Though the timing is not that accurate since right now it times separately for each language in the file.)

If we want to go with the @closing capture then how about we treat it purely as a sentinel node? By that I mean the content which we want to highlight is captured with @delimiter, but in addition to that we also capture the last delimiter with @closing. Here is what it would look like for Lua function definitions where the sentinel is also the last delimiter:
(function_declaration
  "function" @delimiter
  "end" @delimiter @closing) @container

That would definitely be a possibility too.

Here is a contrived HTML example in which for whatever reason I only want to highlight the name of the closing tag, so the delimiter is tag_name and the sentinel is end_tag:
(element
  (start_tag
    "<" @delimiter
    (tag_name) @delimiter
    ">" @delimiter)
  (end_tag
    (tag_name) @delimiter) @closing) @container
This will require adjusting queries, but the change is straight-forward. I can do that myself, no need to waste your time.

I can easily edit the code to work like that very quickly if this is what we want to go with. I think this is a good idea since the code with @closing is also simpler (so hopefully easier to maintain), and it makes sense that the capture starts with @container and ends with @closing. And based on my (admittedly very simple) testing it does seem to be faster than calling ts.is_ancestor().

HiPhish commented 12 months ago

BTW, am not particularly attached to the capture names, so if you have a better suggestions don't hesitate. For example, maybe @sentinel would be better than @closing, because the name @closing makes me think there needs to be a corresponding @opening somewhere.

Danielkonge commented 12 months ago

I think @sentinel does better describe the idea, but it is a pretty uncommon word. Then again, it might not be bad to use unique and descriptive words, so for now let's go with @sentinel.

Danielkonge commented 12 months ago

I have updated the algorithm so it should work with @delimiter and @sentinel as described now. I have checked that it works as I would expect with Lua and Rust so far (and I even managed to speed it up a little [maybe ~10%] by pushing and popping less).

Danielkonge commented 12 months ago

I have tried to update the local strategy in a similar way now too. I think it works as expected, but I only really use the global strategy normally, so I am not entirely sure if it used to behave slightly different in some cases.

Danielkonge commented 11 months ago

I have been testing injected languages in test/stress/markdown/lorem-ipsum.md, but the following lines in the original code seems to cause problems:

https://github.com/HiPhish/rainbow-delimiters.nvim/blob/652345bd1aa333f60c9cbb1259f77155786e5514/lua/rainbow-delimiters/strategy/global.lua#L139C1-L149C8

For now I have commented out those lines and don't see anything going wrong, but I assume that was put there for a reason. @HiPhish do you remember an example where this code is clearly needed?

Also note: I updated the queries for the corresponding languages I needed to test with that markdown file. It seems to work as expected after commenting out the above lines.

HiPhish commented 11 months ago

Yes, I remember why this exists. Let's say you have a Markdown file with embedded Lua:

Here is some Lua code:

~~~lua
-- This is Lua
print({{{}}})
~~~

Back to Markdown

Now if we move (not delete and put) the Lua code one line down it will carry with it the extmarks it had and we will have Lua delimiter highlighting within Markdown. This is because extmarks move with the text. My solution is to have per-language namespaces and delete all extmarks which do not belong to the current language. In the above example the line has Lua extmarks, so when I move it to the Markdown region all non-Markdown extmarks are removed.

Danielkonge commented 11 months ago

Yes, I remember why this exists. Let's say you have a Markdown file with embedded Lua:
Here is some Lua code:

~~~lua
-- This is Lua
print({{{}}})
~~~

Back to Markdown
Now if we move (not delete and put) the Lua code one line down it will carry with it the extmarks it had and we will have Lua delimiter highlighting within Markdown. This is because extmarks move with the text. My solution is to have per-language namespaces and delete all extmarks which do not belong to the current language. In the above example the line has Lua extmarks, so when I move it to the Markdown region all non-Markdown extmarks are removed.

I see. I am able to reproduce this sometimes (in some languages). But the problems I have while enabling that code is worse.

Without the code, I can get the problem like this:

https://github.com/HiPhish/rainbow-delimiters.nvim/assets/7075380/d7e646cc-55dc-44e6-8015-4d895d1da93f

With the code, I can't update the highlighting as usual (and sometimes the highlighting starts out wrong). If I try to do :e<CR>, then highlighting will go away while visible, and I need to scroll away and do :e<CR> and then scroll back to get it to work.

https://github.com/HiPhish/rainbow-delimiters.nvim/assets/7075380/95a7c4a5-0918-48d2-9b31-35fd8ed8bb4e

Note: This doesn't seem to be a problem introduced by the other changes in the code. I already had this kind of problem before.

I will try to see if I can find a better way to do this that solves both issues, but for now it works much better for me without this code (when I don't move stuff like in the example I don't really have any problems without that code, but with the code I have quite a few problems).

Part of the problem seems to be that changes is really not a very good list of changes. E.g., sometimes I get stuff like this from vim.print(lang, changes) before the for loop in question:

markdown
{ { 0, 0, 4294967295, 4294967295 } }
c
{ { 28, 0, 29, 0 }, { 29, 0, 30, 0 }, { 30, 0, 31, 0 }, { 31, 0, 32, 0 }, { 32, 0, 33, 0 }, { 33, 0, 34, 0 }, { 34, 0, 35, 0 }, { 35, 0, 36, 0 }, { 36, 0, 37, 0 }, { 37, 0, 38, 0 }, { 38, 0, 39, 0 }, { 39, 0, 40, 0 } }
vim
{ { 19, 0, 20, 0 }, { 20, 0, 21, 0 } }
lua
{ { 5, 0, 6, 0 }, { 6, 0, 7, 0 } }
python
{ { 48, 0, 49, 0 }, { 49, 0, 50, 0 } }
c
{ { 28, 0, 29, 0 }, { 29, 0, 30, 0 }, { 30, 0, 31, 0 }, { 31, 0, 32, 0 }, { 32, 0, 33, 0 }, { 33, 0, 34, 0 }, { 34, 0, 35, 0 }, { 35, 0, 36, 0 }, { 36, 0, 37, 0 }, { 37, 0, 38, 0 }, { 38, 0, 39, 0 }, { 39, 0, 40, 0 } }
vim
{ { 19, 0, 20, 0 }, { 20, 0, 21, 0 } }
markdown
{ { 0, 0, 154, 0 } }

HiPhish commented 11 months ago

Yes, I have noticed that the changes can be quite weird at time. The number 4294967295 is the maximum possible 32-bit unsigned integer value. Neovim uses the range (0, 4294967295) to mean "the entire buffer".

Can you please mark your PR as WIP or something like that until it is ready for review? I really appreciate your effort and I would prefer not to waste either of our time by reviewing code that isn't done yet.

Danielkonge commented 11 months ago

Yes, I have noticed that the changes can be quite weird at time. The number 4294967295 is the maximum possible 32-bit unsigned integer value. Neovim uses the range (0, 4294967295) to mean "the entire buffer".

Other than that number, I think the weirder thing is that it sometimes sends the same language range multiple times. E.g. in my list above, both markdown lists cover the entire buffer and the vim lists are identical.

Can you please mark your PR as WIP or something like that until it is ready for review? I really appreciate your effort and I would prefer not to waste either of our time by reviewing code that isn't done yet.

I have changed the PR to "Draft" in GitHub for now while I work on this last thing. I don't plan to make more changes to update_range in the global strategy though, so if you want to start looking at the main part of this pull request, that part should be final now.

Danielkonge commented 11 months ago

After looking at this more, it seems that the problematic lines for moving stuff is actually:

if #changes == 0 then
  -- Hack: an empty change set results from an undo.  Force a
  -- full update by setting the maximum possible range
  table.insert(changes, {tree:root():range()})
end

Removing these lines everything behaves as expected for me except when I use undo, but I haven't figured out how to circumvent the undo problem yet. The above doesn't fix it properly. E.g., here is an example where I move one line of c up into markdown:

markdown
{ { 28, 0, 29, 0 } }
c
{}                                        <-- this is where the above code would change something (i.e., before the undo)
2 changes; before #172  1 second ago      <-- undo
markdown
{ { 29, 0, 41, 0 } }                      <-- markdown change when doing undo
c
{ { 29, 0, 30, 0 } }                      <-- c change when doing undo

The problem is that the markdown captures the change in the c code block, so it clears the c namespace, while the c code block itself just notices that only the first line changed, so it doesn't update the highlighting on the other lines that were just cleared. (But changes is not empty, so the above code would not fix this.)

Also, if we change {} to the roots range, then we will mistakenly reset the non C highlighting of some stuff that might not get highlighted afterwards.

I will continue trying to fix or get around this somehow.

Danielkonge commented 11 months ago

@HiPhish: I managed to fix all issues I had with moving code, using undo and any other problems I saw now. If you notice any other problems I can try to fix those too, but otherwise I think this code is ready for review now. (I don't plan to change anything unless someone notices a problem.)

HiPhish commented 11 months ago

Just a quick heads-up, I am in the process of reviewing this PR. Can you please tell me in which order captures are delivered by iter_capture?

Danielkonge commented 11 months ago

Just a quick heads-up, I am in the process of reviewing this PR. Can you please tell me in which order captures are delivered by iter_capture?

iter_captures() delivers the captures in the order they show up in the tree if you go down through the tree from the root (from outside moving inwards in each node).

E.g. this query for html

(element
  (start_tag
    (tag_name) @delimiter
  )
  (end_tag
    (tag_name) @delimiter
  ) @sentinel
) @container

wouldn't work correctly since the @sentinel captures end_tag, which shows up before the tag_name inside it. Thus the correct query for something similar in html is:

(element
  (start_tag
    (tag_name) @delimiter
  )
  (end_tag
    (tag_name) @delimiter @sentinel
  )
) @container

Now the @sentinel is on the tag_name inside end_tag which is indeed the last element we see in this whole capture.

In the first case iter_captures() would give you:

@container from element
@delimiter from tag_name inside start_tag
@sentinel from end_tag [since this shows up in the tree before the inside tag_name]
@delimiter from tag_name inside end_tag

In the second case iter_captures() would give you:

@container from element
@delimiter from tag_name inside start_tag
@delimiter from tag_name inside end_tag
@sentinel from tag_name inside end_tag [after @delimiter because it is listed after it and they both cover the same node]

Danielkonge commented 11 months ago

I have performed a preliminary code review of some surface level stuff. I still need to understand the algorithm inside update_range of the global strategy. From what I gather there is the stack (matches) the current match record (match_record) that is being built up. We iterate through all the captures and differentiate based on the name of the capture:

delimiter is obvious

container has two cases:

If the current record has no container node, then this is its node

If the current match has been closed it belongs to a sibling match. Push the current match to the stack and start a new current match

Otherwise the container belongs to a new child match. Stash the current record on the stack and start a new record which has the previous one as its parent

sentinel closes the current record

If the current record has no parent push it to the stack and start a new record

Otherwise pop the parent off the stack and set it as the current record

The matches stack serves two purposes: eventually it will be a "list" of top-level records, but until then it serves as a stash for records while we collect their children. Is my understanding so far correct?

Yes, it sounds like you have the right idea.

Looking back at the code, we can change the check for whether there is a current container to whether there is a current match_record. [See my comments above.] This complicates the stylua types (since we need to consider nil match_records then), but it might be more clear?

Regarding the order of matches, my understanding is that all captures are returned in a depth-first order. So given this HTML snippet:
<div id="d1">
  <div id="d2">
      <div id="d3">
      </div>
  </div>  
  <div id="d4">
  </div>  
</div>
We would get the captures in this order:

container d1

start d1

container d2

start d2

container d3

start d3

end d3

sentinel d3

end tag d2

sentinel d2

container d4

start d4

end d4

sentineld4

end d1

sentinel d1

Is this the correct order?

Yes, that is the correct order from my understanding of iter_captures().

HiPhish commented 11 months ago

I like the variant with match_record = nil better. Then we don't even need the container field in the match record table. I feel there is a really elegant recursive mathematical definition hidden somewhere under that imperative code, but let's get this out of the door as is first or else it will never get done :)

Can you please check the "allow edits from maintainers" checkbox in your PR? I have a few improvements to discuss that I want to push to this PR as a commit. If I can commit it will be more efficient than if I can only tell you what to do.

On a related note, I have played around with the HTML query and I absolutely love the result. Finally I can have the tag name and the angle brackets highlighted, but not the attributes. I have literally wanted this for years, ever since I first learned about rainbow parentheses. Thank you so much for your work!

Screenshot_20231025_213157

Danielkonge commented 11 months ago

I like the variant with match_record = nil better. Then we don't even need the container field in the match record table. I feel there is a really elegant recursive mathematical definition hidden somewhere under that imperative code, but let's get this out of the door as is first or else it will never get done :)

I have updated the code so it uses match_record = nil and removed container now.

Can you please check the "allow edits from maintainers" checkbox in your PR? I have a few improvements to discuss that I want to push to this PR as a commit. If I can commit it will be more efficient than if I can only tell you what to do.

From my side it looks like this is already checked. Can you not edit now? I will try to see if I need to click "Yes" to something in some settings somewhere.

On a related note, I have played around with the HTML query and I absolutely love the result. Finally I can have the tag name and the angle brackets highlighted, but not the attributes. I have literally wanted this for years, ever since I first learned about rainbow parentheses. Thank you so much for your work!

I have also liked rainbow brackets for a long time and got used to it from emacs. This is the best plugin for it that I have seen in Neovim, so I am happy that we are able to improve on it too! (I was annoyed by Lua blocks highlighting in stead of html, but I also think that got much better with this update.)

HiPhish commented 11 months ago

I get ! [remote rejected] Danielkonge/master -> Danielkonge/master (permission denied) when I try to push to your remote (github.com:Danielkonge/rainbow-delimiters.nvim.git) on the master branch. Maybe you should have created a topic branch first?

HiPhish commented 11 months ago

OK, looks like I figured it out. I had to push the commit as git push Danielkonge-contrib HEAD:master where Danielkonge-contrib is the remote name I have chose for your repo. This will make collaboration much easier.

HiPhish commented 11 months ago

SQL parentheses look fine to be as long as there are no empty parentheses or parentheses containing only one element.

Screenshot_20231028_003155

In a an expression line SELECT (1 + (2)); all the characters inside the outer parentheses are on the same level in the tree.

      term [0, 7] - [0, 16]
        value: "(" [0, 7] - [0, 8]
        value: binary_expression [0, 8] - [0, 15]
          left: literal [0, 8] - [0, 9]
          operator: "+" [0, 10] - [0, 11]
          right: "(" [0, 12] - [0, 13]
          right: literal [0, 13] - [0, 14]
          right: ")" [0, 14] - [0, 15]
        value: ")" [0, 15] - [0, 16]

I don't think there is anything we can do without a container node.

Danielkonge commented 11 months ago

SQL parentheses look fine to be as long as there are no empty parentheses or parentheses containing only one element.

You mean after my update right? Because I was definitely having problems other than with nesting before that.

Also, since writing something like ((())) is very uncommon outside tests like this, I think it might actually make sense to simplify the code a bit and just let it look bad on something like that. (As you said this is mostly because of a problem with SQL's tree structure.)

[Update: I have changed this to be simpler for now, but I left a comment for what can be done to make it slightly better if you need ((())).]

HiPhish commented 11 months ago

Yes, after the update.

Also, since writing something like ((())) is very uncommon outside tests like this, I think it might actually make sense to simplify the code a bit and just let it look bad on something like that.

Agreed. In SQLite at least SELECT (((1))); is legal query, but no one is ever going to write code like this.

Danielkonge commented 11 months ago

I updated the local and global strategy so that any capture name that starts with 'ignore' will be ignored (we don't want to log those). So we can use ignore, ignore1, ignore_self_closing, etc. for any captures we want to use for predicates like #eq?, #any-eq?, etc.

HiPhish commented 11 months ago

I updated the local and global strategy so that any capture name that starts with 'ignore' will be ignored (we don't want to log those). So we can use ignore, ignore1, ignore_self_closing, etc. for any captures we want to use for predicates like #eq?, #any-eq?, etc.

Why not just ignore any capture that has a name other than container, delimiter and sentinel? Currently unknown capture names are logged, but the function will run on every change and spam the logs. If it is about validating queries for query authors, we can do so only once when the query is loaded first like this:

local known_captures = {delimiter = true, container = true, sentinel = true}

local q = vim.treesitter.query.get('lua', 'rainbow-delimiters')
for _, capture in ipairs(q.captures) do
    if not known_captures[capture] or not starts_with_ignore(capture) then
        log.error('Unknown capture name: ' .. capture)
    end
end

This validation can be added somewhere in the lib module, for example when registering a new namespace. Then the validation will only run once over the entire lifetime of the Neovim process.

I would not add validation to the scope of this PR, it is big enough as it is. We can open a new ticket for that.

Danielkonge commented 11 months ago

Why not just ignore any capture that has a name other than container, delimiter and sentinel? Currently unknown capture names are logged, but the function will run on every change and spam the logs. If it is about validating queries for query authors, we can do so only once when the query is loaded first like this:
local known_captures = {delimiter = true, container = true, sentinel = true}

local q = vim.treesitter.query.get('lua', 'rainbow-delimiters')
for _, capture in ipairs(q.captures) do
    if not known_captures[capture] or not starts_with_ignore(capture) then
        log.error('Unknown capture name: ' .. capture)
    end
end
This validation can be added somewhere in the lib module, for example when registering a new namespace. Then the validation will only run once over the entire lifetime of the Neovim process.

I would not add validation to the scope of this PR, it is big enough as it is. We can open a new ticket for that.

I think you are right, so I have removed the unknown capture name cases now. I still have the case where we get a @delimiter or a @sentinel without having a match_record. As long as the queries (and code) are correct we should never end up in this case, so it shouldn't be spamming the logs. Do you want to change this case too?

HiPhish commented 11 months ago

Do you want to change this case too?

No, please leave it in. This is an error case which cannot be detected until the code actually runs, so having the error in the logs is valuable.

HiPhish commented 11 months ago

I have noticed a problem with nested languages: when I move a line up out of a nested block it will retain its rainbow highlighting. Here is a Markdown example

Some markdown.
~~~lua
print({{{{}}}})
print({{{{}}}})
~~~
More markdown

I place the cursor on line 3 and execute :move 1. The line will lose its Lua highlighting, but it will retain its rainbow highlighting. However, if I place the cursor on line 4 and execute :move 5 (i.e. downwards) it will lose the rainbow highlighting.

Screenshot_20231029_233710

Danielkonge commented 11 months ago

I have noticed a problem with nested languages: when I move a line up out of a nested block it will retain its rainbow highlighting. Here is a Markdown example
Some markdown.
~~~lua
print({{{{}}}})
print({{{{}}}})
~~~
More markdown
I place the cursor on line 3 and execute :move 1. The line will lose its Lua highlighting, but it will retain its rainbow highlighting. However, if I place the cursor on line 4 and execute :move 5 (i.e. downwards) it will lose the rainbow highlighting.

I can reproduce this if I copy exactly your short markdown file there, but it is not a problem in the stress markdown file, so I am not sure what causes this. I will try to look at the changes returned some more.

Danielkonge commented 11 months ago

I think I figured out the problem. It seems to be an off by one error from what I can tell.

Changing

vim.api.nvim_buf_clear_namespace(bufnr, nsid, change[1], change[3])

to

vim.api.nvim_buf_clear_namespace(bufnr, nsid, change[1], change[3] + 1)

fixes the problem.

https://github.com/HiPhish/rainbow-delimiters.nvim/assets/7075380/bfd1df5e-2ba9-4559-ace4-a6e1cb9d2136

But this then breaks html highlighting. I originally had the lower one, but it gives this highlighting:

instead of:

It seems to be a problem with clearing out the old highlights caused by different behavior in the different parsers. I am not yet sure though, so I will try to look into this a bit more.

Danielkonge commented 11 months ago

https://github.com/HiPhish/rainbow-delimiters.nvim/pull/43/commits/a8ff2a79ec95c5894d98135dd2a2bc68f14f25c6

This is a temporary(?) fix that seems to work in almost all cases I have checked. The only way I have managed to break this is by having nested language injections (i.e., a language injection in a language injection), but that seems like a rare need. In those rare cases, I think it makes sense to refresh highlighting (that will fix it) if you move code.

Danielkonge commented 11 months ago

I cleaned up the code a bit and noticed another problem with injections in Rust. When you use macros in Rust, it treats that as a language injection with the language being Rust, but when we inject the language itself, it will already be taken care of by the part of the code without a parent language, so in that case we should do nothing. This has been messing with Rust highlighting of (and around) macros for me for a long time, and the current fix I made finally gets around it.

I have tested out the code some more and haven't noticed any problems now except when you have nested language injections, which I think is pretty rare. In those cases updating highlighting or moving stuff can break the highlighting with the current code. E.g.:

Markdown text.
~~~lua
print 'This is injected Lua in Markdown (one level deep)'
vim.cmd[[
echo str([])
echo 'This is injected Vim in Lua in Markdown (two levels deep)'
]]
~~~

HiPhish commented 11 months ago

You have the patience of a saint. It will be a day or two before I get around to looking at the code, so I need to ask for a bit more patience.

Danielkonge commented 11 months ago

Instead of ignore<anything> for the captures that we ignore for both debug and highlighting, do you think it would make more sense to use _<anything>? I.e., just ignore stuff starting with underscore. I feel like that might be a bit more expected behavior.

HiPhish commented 11 months ago

Yes, underscore is better. It is more in line with ignored variables in programming languages and harder to mistype.

Danielkonge commented 11 months ago

In https://github.com/HiPhish/rainbow-delimiters.nvim/pull/43/commits/d4cc189f80dedf327aa2308033823cb515a830d1 I included some comments that I will delete soon, but I made a commit to look back on if needed.

I played around with using vim.api.nvim_get_extmarks and vim.api.nvim_del_extmark instead of vim.api.nvim_buf_clear_namespace since those allow us to work with exact rows and columns instead of just the rows, but I didn't manage to improve the behavior over what we have now (there was still a problem with nested injections), so I didn't switch to those functions in the code.

Also: I think on_changedtree is supposed to work as we want without the need for the changes = {{tree:root():range()}} hack, so we might want to report that as a bug upstream. (The video example with an injected C block above is where you can see the bug.)

HiPhish commented 11 months ago

I just reordered some code to remove one level of if-nesting. Can you please double-check that the conditions are still the same?

Danielkonge commented 11 months ago

I just reordered some code to remove one level of if-nesting. Can you please double-check that the conditions are still the same?

The logic looks correct to me. (Also, it behaves as expected in quick testing of what that code is supposed to solve.)

HiPhish commented 11 months ago

Where did you get the _parent_lang field of the parser from? It is not documented, and when I inspect a parser object there is no such field either. I used the following Markdown text:

Hello

```c
puts((((("This is an injected language")))));
{
    {
        {
            {
                {
                    return ((((((((((((2)))))))))))) + ((((((3))))));
                }
            }
        }
    }
}
```

Then I set p = vim.treesitter.get_parser() and executed lua =p to view the fields of the parser object. Each parser has a children field which maps contained languages to their respective parsers. But there is no _parent_lang field.

EDIT: I don't see any way of finding out whether the parser has a parent language. We can descend deeper into the tree, but not back up.

Danielkonge commented 11 months ago

Where did you get the _parent_lang field of the parser from? It is not documented, and when I inspect a parser object there is no such field either. I used the following Markdown text:
Hello

```c
puts((((("This is an injected language")))));
{
    {
        {
            {
                {
                    return ((((((((((((2)))))))))))) + ((((((3))))));
                }
            }
        }
    }
}
```
Then I set p = vim.treesitter.get_parser() and executed lua =p to view the fields of the parser object. Each parser has a children field which maps contained languages to their respective parsers. But there is no _parent_lang field.

EDIT: I don't see any way of finding out whether the parser has a parent language. We can descend deeper into the tree, but not back up.

I found _parent_lang by printing p in on_changedtree and then going through :messages. It seems to be nil (so not shown) when there is no parent language, but it should show up for all injections.

If we want to avoid using _parent_lang and still have movement of text work with injections, I think we need to figure out what causes the wrong changes to be reported in certain cases (like the above c code). If that is fixed, then we could simplify the code a bit and even have it work with nested injections.

UPDATE: I just checked, and this seems to be only available on HEAD and not in 0.9.4.

Danielkonge commented 11 months ago

Since p._parent_lang seems to be a nightly feature, I have updated util.for_each_child so that we can keep track of the parent language ourselves instead. Everything seems to work as expected in the global strategy now (I haven't tested the corresponding changes in the local strategy yet).

Please let me know if there are any problems with this update.

Danielkonge commented 11 months ago

I think I have fixed the problem with nested injections now, but the solution is a bit hacky. I don't think it should affect the performance of the highlighting though, since the main part of the updated code will only be run when we are at least two injections deep.

Please let me know if you notice any problems.

Note: If we don't want to include this last change, the previous code before this change worked for everything other than nested injections in my testing.

HiPhish commented 11 months ago

I don't think we need the function for_each_child_dif_lang if we move the test into the thunk. I have done so in the previous commit, please check if the logic is the same.

HiPhish / rainbow-delimiters.nvim

Rainbow blocks fix for Lua (+ other languages with a little work) #43