Closed Danielkonge closed 10 months ago
Thank you for wanting to help out. Can you please explain how you worked around the issue of matching the same capture multiple times? My understanding is that if you have a query like (foo (bar) @xxx (baz) @xxx)
Neovim's Tree-sitter API cannot handle the two @xxx
captures and will only give me one. If I recall correctly that is why I have separate @opening
, @intermediate
and @closing
captures.
The problem is not really Neovim from what I can tell. If you play around with the treesitter playground (https://github.com/nvim-treesitter/playground) all the queries captures multiple captures fine.
(Look at the query and lines 13 to 21.)
The problem is really with iter_matches
(which is the obvious choice for this algorithm). (This is the problem discussed here: https://github.com/neovim/neovim/pull/17099 )
So I tried rewriting basically the same algorithm but using iter_captures
instead of iter_matches
, since iter_captures
does have the behavior I would expect (unlike iter_matches
).
Unlike iter_matches
, which captures the @container
as a whole and then only can give you one of each capture inside the @container
(e.g. only one @intermediate
above), iter_captures
gets all the captures in the following order:
@container
first and then the order the other captures show up in in the code. If there is another @container
inside the first @container
, then we start going through that. I.e., we go through all captures in the order they show up in the code, but each "large" capture starts with @container
whereafter we go through the captures inside the @container
.
The algorithm tries to mirror the previous algorithm (depth first search).
(1) We make an empty matches
stack and an empty match_record
and start going through all the captured nodes based on our queries (on the changed part as before).
(2) If the captures name is @container
that means we are starting going through a new container.
If match_record.container
is not yet set, then that means that we are not currently inside another @container
, so we set match_record.container = node
.
If match_record.container
is set, then that means that our current node is inside match_record.container
and we starting on a new container. So we (temporarily) push match_record
to matches
so we can pop it back out later when we are done going through the current node. And then we start a new match_record
with container
equal to the current node.
(3) If the node's capture name is @closing
that means that we are done with the current container (this is a new assumption I added!). So we set match_record.closing
to be the current node and pop out the previous match_record from matches
(if there is one).
match_record
, then we add the current match_record
to the children
of the previous match_record, which we are still inside (the one we temporarily pushed earlier). And we continue working with the previous match_record which is now the new current match_record
since we are still working inside it. [NOTE: SEE COMMENT FURTHER DOWN]match_record
, then we are done with the current match_record
(since we saw its @closing
), so after popping out the previous match_record, we push both the current match_record
and the previous match_record to matches
(since it was not an ancestor, we are out of it too). [The order we push these doesn't matter.] Finally we create a new empty match_record
in this case.If there is no previous match_record, we cannot be inside another @container
, so we just push the current match_record
to matches
and create a new empty match_record
.
(4) If the node's capture name is not @container
or @closing
, then we add to the corresponding field in match_record
(@opening
or @intermediate
) if such a field exists. Since we haven't seen a @closing
in the current @container
yet, this will be assigned correctly simply by the order iter_captures
goes through the captures.
All of this is really just trying to implement the old algorithm with the restrictions we get from using iter_captures
instead of iter_matches
.
If we want this to replace the current global
(and maybe local
?) strategy, I think it would be a good idea to actually still keep the old code around and just comment it out (this pull request doesn't do this right now). If they ever fix iter_matches
so it has the expected behavior upstream, I think the current implementation is nicer than what you see in this pull request. [It doesn't add the extra assumption that all @container
s end with a single @closing
.]
Another solution could be to keep a better version of iter_matches
as part of rainbow-delimiters if you prefer that instead, or just adapt this as an alternative strategy (e.g. global_alt
).
This might be an even better example showing that Neovim can catch all the test
in the pattern although they all have the same name here.
(Line 13-21 again. There are 3 test
on the first line because the one caught for the whole "container" is listed there.)
Ah yes, the cursed issue where everyone who takes it over vanishes. I am familiar with iter_matches
, but I did not consider using iter_captures
.
I will have to go over your algorithm in detail when I have the time one of these days. But I don't like that @closing
has to be a single node, this just kicks the can further down the road. How about this:
@container
and @delimiter
@capture
starts a new context@delimiter
node check whether it is a child of the current topmost @container
on the stack
@container
This avoids the need for a @closing
node altogether, but in return we have to perform this check for every single node. I don't know if this would make much of a performance impact or if it is just a matter of comparing a few numbers. What do you think of my proposal?
If we want this to replace the current
global
(and maybelocal
?) strategy, I think it would be a good idea to actually still keep the old code around and just comment it out (this pull request doesn't do this right now).
They would not work if we rename the captures in the queries. We could keep the old names and just treat them the same in the new strategies. But honestly I don't see much of a point in keeping the old strategies around. The algorithm is limited to one function and described in the HACKING file if we need it for posterity.
On a side note, how hard did you find it to get into the code? I rarely get non-trivial contributions, so I have no idea how readable and approachable my code really is.
First a note to the algorithm I wrote above: You can actually avoid doing the check for whether something is an ancestor (where I wrote NOTE above) if you assume that there is only one @closing
, since you can just check instead if the previous match record has already been closed or not. I don't know if this will actually speed up things significantly or not, but I thought I would note it for now.
I will have to go over your algorithm in detail when I have the time one of these days. But I don't like that
@closing
has to be a single node, this just kicks the can further down the road. How about this:
- there are only two captures
@container
and@delimiter
- As you have stated a new
@capture
starts a new contextFor every
@delimiter
node check whether it is a child of the current topmost@container
on the stack
- If it is then keep going
- Otherwise we are done with the current
@container
This avoids the need for a
@closing
node altogether, but in return we have to perform this check for every single node. I don't know if this would make much of a performance impact or if it is just a matter of comparing a few numbers. What do you think of my proposal?
Your suggestion is definitely possible and it shouldn't be that hard to change what I currently have. I can try to have code for both algorithms written so we can test them a bit more (this sounds like something where we might want to actually benchmark the performance of these things, since I also don't really know the implications of checking if something is a parent of a treesitter node or not). [I think the parent check should be quick though.]
If we want this to replace the current
global
(and maybelocal
?) strategy, I think it would be a good idea to actually still keep the old code around and just comment it out (this pull request doesn't do this right now).They would not work if we rename the captures in the queries. We could keep the old names and just treat them the same in the new strategies. But honestly I don't see much of a point in keeping the old strategies around. The algorithm is limited to one function and described in the HACKING file if we need it for posterity.
I will start out by keeping the old capture names around for now, just because we would need to edit all the queries otherwise, and it will also make it easier to compare the two implementations. If we decide on changing the names to @delimiter
this will be easy to change later.
On a side note, how hard did you find it to get into the code? I rarely get non-trivial contributions, so I have no idea how readable and approachable my code really is.
I think it was pretty readable, but I mostly looked at the queries and strategies (+ a little at the general implementation). I am pretty new to Lua, so some of the meta tables stuff was a bit confusing, but that probably has more to do with me not being used to Lua meta tables yet than your actual code. Also I have never used Vader, so for testing I have just been manually checking whether it works as expected in my day to day programming + a few of the test files available.
To be a bit more precise, I think your Stack implementation and the strategy code is very easy to follow, but I was a bit confused about how the namespaces worked in lib.lua
. After looking at it more, I guess part of it is because vim.api.nvim_create_namespace()
doesn't behave as I would have liked. (I would have liked it to be easier to have per-language namespaces that I could search for when looking at Extmarks.)
I have added an alternative update_range
function (currently called update_range2
) that is easy to change so it works with just @delimiter
and @container
(all that is needed is changing the names of capture groups in new_match_record
and highlight_matches
). I have only tested functionality and not performance so far, but it seems to work fine and I can't tell any difference in performance on my computer. Though we should probably do some actual performance tests comparing update_range1
with update_range2
.
I have a large local Rust file (5.5MB) that I tried to use to compare the two algorithms.
On my computer with this file, both algorithms seem to highlight ok, and the main part of update_range1
takes about 0.6secs on average while the main part of update_range2
takes about 1.2secs. This doesn't seem like a huge difference, but update_range1
was consistently about twice as fast as update_range2
.
Also, it should be noted that this file has almost no nested highlights levels (it is basically all level 2 in one long list), so ts.is_ancestor
only needs to do very little work.
In general, it seems that keeping track of @closing
tags and ensuring that there is exactly one @closing
for every @container
will be faster, but for more common size files there is no difference. (On smaller normal size files they both take about 0.01-0.05secs for me.)
Also note that this was timed with os.clock()
in Lua, which I can read is not the best for real world timing of algorithms.
There is also a C++ file under test/stress/markdown/lorem-ipsum.md
which you can try. It has a number of languages embedded in Markdown, but it might be too short to deliver any meaningful results.
If we want to go with the @closing
capture then how about we treat it purely as a sentinel node? By that I mean the content which we want to highlight is captured with @delimiter
, but in addition to that we also capture the last delimiter with @closing
. Here is what it would look like for Lua function definitions where the sentinel is also the last delimiter:
(function_declaration
"function" @delimiter
"end" @delimiter @closing) @container
Here is a contrived HTML example in which for whatever reason I only want to highlight the name of the closing tag, so the delimiter is tag_name
and the sentinel is end_tag
:
(element
(start_tag
"<" @delimiter
(tag_name) @delimiter
">" @delimiter)
(end_tag
(tag_name) @delimiter) @closing) @container
This will require adjusting queries, but the change is straight-forward. I can do that myself, no need to waste your time.
There is also a C++ file under
test/stress/markdown/lorem-ipsum.md
which you can try. It has a number of languages embedded in Markdown, but it might be too short to deliver any meaningful results.
Both algorithms take 0.00secs in my timing, so it is definitely too short to see anything. (Though the timing is not that accurate since right now it times separately for each language in the file.)
If we want to go with the
@closing
capture then how about we treat it purely as a sentinel node? By that I mean the content which we want to highlight is captured with@delimiter
, but in addition to that we also capture the last delimiter with@closing
. Here is what it would look like for Lua function definitions where the sentinel is also the last delimiter:(function_declaration "function" @delimiter "end" @delimiter @closing) @container
That would definitely be a possibility too.
Here is a contrived HTML example in which for whatever reason I only want to highlight the name of the closing tag, so the delimiter is
tag_name
and the sentinel isend_tag
:(element (start_tag "<" @delimiter (tag_name) @delimiter ">" @delimiter) (end_tag (tag_name) @delimiter) @closing) @container
This will require adjusting queries, but the change is straight-forward. I can do that myself, no need to waste your time.
I can easily edit the code to work like that very quickly if this is what we want to go with. I think this is a good idea since the code with @closing
is also simpler (so hopefully easier to maintain), and it makes sense that the capture starts with @container
and ends with @closing
. And based on my (admittedly very simple) testing it does seem to be faster than calling ts.is_ancestor()
.
BTW, am not particularly attached to the capture names, so if you have a better suggestions don't hesitate. For example, maybe @sentinel
would be better than @closing
, because the name @closing
makes me think there needs to be a corresponding @opening
somewhere.
I think @sentinel
does better describe the idea, but it is a pretty uncommon word. Then again, it might not be bad to use unique and descriptive words, so for now let's go with @sentinel
.
I have updated the algorithm so it should work with @delimiter
and @sentinel
as described now. I have checked that it works as I would expect with Lua and Rust so far (and I even managed to speed it up a little [maybe ~10%] by pushing and popping less).
I have tried to update the local strategy in a similar way now too. I think it works as expected, but I only really use the global strategy normally, so I am not entirely sure if it used to behave slightly different in some cases.
I have been testing injected languages in test/stress/markdown/lorem-ipsum.md
, but the following lines in the original code seems to cause problems:
For now I have commented out those lines and don't see anything going wrong, but I assume that was put there for a reason. @HiPhish do you remember an example where this code is clearly needed?
Also note: I updated the queries for the corresponding languages I needed to test with that markdown file. It seems to work as expected after commenting out the above lines.
Yes, I remember why this exists. Let's say you have a Markdown file with embedded Lua:
Here is some Lua code:
~~~lua
-- This is Lua
print({{{}}})
~~~
Back to Markdown
Now if we move (not delete and put) the Lua code one line down it will carry with it the extmarks it had and we will have Lua delimiter highlighting within Markdown. This is because extmarks move with the text. My solution is to have per-language namespaces and delete all extmarks which do not belong to the current language. In the above example the line has Lua extmarks, so when I move it to the Markdown region all non-Markdown extmarks are removed.
Yes, I remember why this exists. Let's say you have a Markdown file with embedded Lua:
Here is some Lua code: ~~~lua -- This is Lua print({{{}}}) ~~~ Back to Markdown
Now if we move (not delete and put) the Lua code one line down it will carry with it the extmarks it had and we will have Lua delimiter highlighting within Markdown. This is because extmarks move with the text. My solution is to have per-language namespaces and delete all extmarks which do not belong to the current language. In the above example the line has Lua extmarks, so when I move it to the Markdown region all non-Markdown extmarks are removed.
I see. I am able to reproduce this sometimes (in some languages). But the problems I have while enabling that code is worse.
Without the code, I can get the problem like this:
With the code, I can't update the highlighting as usual (and sometimes the highlighting starts out wrong). If I try to do :e<CR>
, then highlighting will go away while visible, and I need to scroll away and do :e<CR>
and then scroll back to get it to work.
Note: This doesn't seem to be a problem introduced by the other changes in the code. I already had this kind of problem before.
I will try to see if I can find a better way to do this that solves both issues, but for now it works much better for me without this code (when I don't move stuff like in the example I don't really have any problems without that code, but with the code I have quite a few problems).
Part of the problem seems to be that changes
is really not a very good list of changes. E.g., sometimes I get stuff like this from vim.print(lang, changes)
before the for loop in question:
markdown
{ { 0, 0, 4294967295, 4294967295 } }
c
{ { 28, 0, 29, 0 }, { 29, 0, 30, 0 }, { 30, 0, 31, 0 }, { 31, 0, 32, 0 }, { 32, 0, 33, 0 }, { 33, 0, 34, 0 }, { 34, 0, 35, 0 }, { 35, 0, 36, 0 }, { 36, 0, 37, 0 }, { 37, 0, 38, 0 }, { 38, 0, 39, 0 }, { 39, 0, 40, 0 } }
vim
{ { 19, 0, 20, 0 }, { 20, 0, 21, 0 } }
lua
{ { 5, 0, 6, 0 }, { 6, 0, 7, 0 } }
python
{ { 48, 0, 49, 0 }, { 49, 0, 50, 0 } }
c
{ { 28, 0, 29, 0 }, { 29, 0, 30, 0 }, { 30, 0, 31, 0 }, { 31, 0, 32, 0 }, { 32, 0, 33, 0 }, { 33, 0, 34, 0 }, { 34, 0, 35, 0 }, { 35, 0, 36, 0 }, { 36, 0, 37, 0 }, { 37, 0, 38, 0 }, { 38, 0, 39, 0 }, { 39, 0, 40, 0 } }
vim
{ { 19, 0, 20, 0 }, { 20, 0, 21, 0 } }
markdown
{ { 0, 0, 154, 0 } }
Yes, I have noticed that the changes
can be quite weird at time. The number 4294967295
is the maximum possible 32-bit unsigned integer value. Neovim uses the range (0, 4294967295)
to mean "the entire buffer".
Can you please mark your PR as WIP or something like that until it is ready for review? I really appreciate your effort and I would prefer not to waste either of our time by reviewing code that isn't done yet.
Yes, I have noticed that the
changes
can be quite weird at time. The number4294967295
is the maximum possible 32-bit unsigned integer value. Neovim uses the range(0, 4294967295)
to mean "the entire buffer".
Other than that number, I think the weirder thing is that it sometimes sends the same language range multiple times. E.g. in my list above, both markdown lists cover the entire buffer and the vim lists are identical.
Can you please mark your PR as WIP or something like that until it is ready for review? I really appreciate your effort and I would prefer not to waste either of our time by reviewing code that isn't done yet.
I have changed the PR to "Draft" in GitHub for now while I work on this last thing. I don't plan to make more changes to update_range
in the global strategy though, so if you want to start looking at the main part of this pull request, that part should be final now.
After looking at this more, it seems that the problematic lines for moving stuff is actually:
if #changes == 0 then
-- Hack: an empty change set results from an undo. Force a
-- full update by setting the maximum possible range
table.insert(changes, {tree:root():range()})
end
Removing these lines everything behaves as expected for me except when I use undo, but I haven't figured out how to circumvent the undo problem yet. The above doesn't fix it properly. E.g., here is an example where I move one line of c up into markdown:
markdown
{ { 28, 0, 29, 0 } }
c
{} <-- this is where the above code would change something (i.e., before the undo)
2 changes; before #172 1 second ago <-- undo
markdown
{ { 29, 0, 41, 0 } } <-- markdown change when doing undo
c
{ { 29, 0, 30, 0 } } <-- c change when doing undo
The problem is that the markdown captures the change in the c code block, so it clears the c namespace, while the c code block itself just notices that only the first line changed, so it doesn't update the highlighting on the other lines that were just cleared. (But changes is not empty, so the above code would not fix this.)
Also, if we change {}
to the roots range, then we will mistakenly reset the non C highlighting of some stuff that might not get highlighted afterwards.
I will continue trying to fix or get around this somehow.
@HiPhish: I managed to fix all issues I had with moving code, using undo and any other problems I saw now. If you notice any other problems I can try to fix those too, but otherwise I think this code is ready for review now. (I don't plan to change anything unless someone notices a problem.)
Just a quick heads-up, I am in the process of reviewing this PR. Can you please tell me in which order captures are delivered by iter_capture?
Just a quick heads-up, I am in the process of reviewing this PR. Can you please tell me in which order captures are delivered by iter_capture?
iter_captures()
delivers the captures in the order they show up in the tree if you go down through the tree from the root (from outside moving inwards in each node).
E.g. this query for html
(element
(start_tag
(tag_name) @delimiter
)
(end_tag
(tag_name) @delimiter
) @sentinel
) @container
wouldn't work correctly since the @sentinel
captures end_tag
, which shows up before the tag_name
inside it. Thus the correct query for something similar in html is:
(element
(start_tag
(tag_name) @delimiter
)
(end_tag
(tag_name) @delimiter @sentinel
)
) @container
Now the @sentinel
is on the tag_name
inside end_tag
which is indeed the last element we see in this whole capture.
In the first case iter_captures()
would give you:
@container
from element
@delimiter
from tag_name
inside start_tag
@sentinel
from end_tag
[since this shows up in the tree before the inside tag_name
]@delimiter
from tag_name
inside end_tag
In the second case iter_captures()
would give you:
@container
from element
@delimiter
from tag_name
inside start_tag
@delimiter
from tag_name
inside end_tag
@sentinel
from tag_name
inside end_tag
[after @delimiter
because it is listed after it and they both cover the same node]I have performed a preliminary code review of some surface level stuff. I still need to understand the algorithm inside
update_range
of the global strategy. From what I gather there is the stack (matches
) the current match record (match_record
) that is being built up. We iterate through all the captures and differentiate based on the name of the capture:
delimiter
is obvious
container
has two cases:
- If the current record has no container node, then this is its node
- If the current match has been closed it belongs to a sibling match. Push the current match to the stack and start a new current match
- Otherwise the container belongs to a new child match. Stash the current record on the stack and start a new record which has the previous one as its parent
sentinel
closes the current record
- If the current record has no parent push it to the stack and start a new record
- Otherwise pop the parent off the stack and set it as the current record
The
matches
stack serves two purposes: eventually it will be a "list" of top-level records, but until then it serves as a stash for records while we collect their children. Is my understanding so far correct?
Yes, it sounds like you have the right idea.
Looking back at the code, we can change the check for whether there is a current container
to whether there is a current match_record
. [See my comments above.] This complicates the stylua types (since we need to consider nil match_record
s then), but it might be more clear?
Regarding the order of matches, my understanding is that all captures are returned in a depth-first order. So given this HTML snippet:
<div id="d1"> <div id="d2"> <div id="d3"> </div> </div> <div id="d4"> </div> </div>
We would get the captures in this order:
- container
d1
- start
d1
- container
d2
- start
d2
- container
d3
- start
d3
- end
d3
- sentinel
d3
- end tag
d2
- sentinel
d2
- container
d4
- start
d4
- end
d4
- sentinel
d4
- end
d1
- sentinel
d1
Is this the correct order?
Yes, that is the correct order from my understanding of iter_captures()
.
I like the variant with match_record = nil
better. Then we don't even need the container
field in the match record table. I feel there is a really elegant recursive mathematical definition hidden somewhere under that imperative code, but let's get this out of the door as is first or else it will never get done :)
Can you please check the "allow edits from maintainers" checkbox in your PR? I have a few improvements to discuss that I want to push to this PR as a commit. If I can commit it will be more efficient than if I can only tell you what to do.
On a related note, I have played around with the HTML query and I absolutely love the result. Finally I can have the tag name and the angle brackets highlighted, but not the attributes. I have literally wanted this for years, ever since I first learned about rainbow parentheses. Thank you so much for your work!
I like the variant with
match_record = nil
better. Then we don't even need thecontainer
field in the match record table. I feel there is a really elegant recursive mathematical definition hidden somewhere under that imperative code, but let's get this out of the door as is first or else it will never get done :)
I have updated the code so it uses match_record = nil
and removed container
now.
Can you please check the "allow edits from maintainers" checkbox in your PR? I have a few improvements to discuss that I want to push to this PR as a commit. If I can commit it will be more efficient than if I can only tell you what to do.
From my side it looks like this is already checked. Can you not edit now? I will try to see if I need to click "Yes" to something in some settings somewhere.
On a related note, I have played around with the HTML query and I absolutely love the result. Finally I can have the tag name and the angle brackets highlighted, but not the attributes. I have literally wanted this for years, ever since I first learned about rainbow parentheses. Thank you so much for your work!
I have also liked rainbow brackets for a long time and got used to it from emacs. This is the best plugin for it that I have seen in Neovim, so I am happy that we are able to improve on it too! (I was annoyed by Lua blocks highlighting in stead of html, but I also think that got much better with this update.)
I get ! [remote rejected] Danielkonge/master -> Danielkonge/master (permission denied)
when I try to push to your remote (github.com:Danielkonge/rainbow-delimiters.nvim.git
) on the master
branch. Maybe you should have created a topic branch first?
OK, looks like I figured it out. I had to push the commit as git push Danielkonge-contrib HEAD:master
where Danielkonge-contrib
is the remote name I have chose for your repo. This will make collaboration much easier.
SQL parentheses look fine to be as long as there are no empty parentheses or parentheses containing only one element.
In a an expression line SELECT (1 + (2));
all the characters inside the outer parentheses are on the same level in the tree.
term [0, 7] - [0, 16]
value: "(" [0, 7] - [0, 8]
value: binary_expression [0, 8] - [0, 15]
left: literal [0, 8] - [0, 9]
operator: "+" [0, 10] - [0, 11]
right: "(" [0, 12] - [0, 13]
right: literal [0, 13] - [0, 14]
right: ")" [0, 14] - [0, 15]
value: ")" [0, 15] - [0, 16]
I don't think there is anything we can do without a container node.
SQL parentheses look fine to be as long as there are no empty parentheses or parentheses containing only one element.
You mean after my update right? Because I was definitely having problems other than with nesting before that.
Also, since writing something like ((()))
is very uncommon outside tests like this, I think it might actually make sense to simplify the code a bit and just let it look bad on something like that. (As you said this is mostly because of a problem with SQL's tree structure.)
[Update: I have changed this to be simpler for now, but I left a comment for what can be done to make it slightly better if you need ((()))
.]
Yes, after the update.
Also, since writing something like
((()))
is very uncommon outside tests like this, I think it might actually make sense to simplify the code a bit and just let it look bad on something like that.
Agreed. In SQLite at least SELECT (((1)));
is legal query, but no one is ever going to write code like this.
I updated the local and global strategy so that any capture name that starts with 'ignore' will be ignored (we don't want to log those). So we can use ignore
, ignore1
, ignore_self_closing
, etc. for any captures we want to use for predicates like #eq?
, #any-eq?
, etc.
I updated the local and global strategy so that any capture name that starts with 'ignore' will be ignored (we don't want to log those). So we can use
ignore
,ignore1
,ignore_self_closing
, etc. for any captures we want to use for predicates like#eq?
,#any-eq?
, etc.
Why not just ignore any capture that has a name other than container
, delimiter
and sentinel
? Currently unknown capture names are logged, but the function will run on every change and spam the logs. If it is about validating queries for query authors, we can do so only once when the query is loaded first like this:
local known_captures = {delimiter = true, container = true, sentinel = true}
local q = vim.treesitter.query.get('lua', 'rainbow-delimiters')
for _, capture in ipairs(q.captures) do
if not known_captures[capture] or not starts_with_ignore(capture) then
log.error('Unknown capture name: ' .. capture)
end
end
This validation can be added somewhere in the lib
module, for example when registering a new namespace. Then the validation will only run once over the entire lifetime of the Neovim process.
I would not add validation to the scope of this PR, it is big enough as it is. We can open a new ticket for that.
Why not just ignore any capture that has a name other than
container
,delimiter
andsentinel
? Currently unknown capture names are logged, but the function will run on every change and spam the logs. If it is about validating queries for query authors, we can do so only once when the query is loaded first like this:local known_captures = {delimiter = true, container = true, sentinel = true} local q = vim.treesitter.query.get('lua', 'rainbow-delimiters') for _, capture in ipairs(q.captures) do if not known_captures[capture] or not starts_with_ignore(capture) then log.error('Unknown capture name: ' .. capture) end end
This validation can be added somewhere in the
lib
module, for example when registering a new namespace. Then the validation will only run once over the entire lifetime of the Neovim process.I would not add validation to the scope of this PR, it is big enough as it is. We can open a new ticket for that.
I think you are right, so I have removed the unknown capture name cases now. I still have the case where we get a @delimiter
or a @sentinel
without having a match_record
. As long as the queries (and code) are correct we should never end up in this case, so it shouldn't be spamming the logs. Do you want to change this case too?
Do you want to change this case too?
No, please leave it in. This is an error case which cannot be detected until the code actually runs, so having the error in the logs is valuable.
I have noticed a problem with nested languages: when I move a line up out of a nested block it will retain its rainbow highlighting. Here is a Markdown example
Some markdown.
~~~lua
print({{{{}}}})
print({{{{}}}})
~~~
More markdown
I place the cursor on line 3 and execute :move 1
. The line will lose its Lua highlighting, but it will retain its rainbow highlighting. However, if I place the cursor on line 4 and execute :move 5
(i.e. downwards) it will lose the rainbow highlighting.
I have noticed a problem with nested languages: when I move a line up out of a nested block it will retain its rainbow highlighting. Here is a Markdown example
Some markdown. ~~~lua print({{{{}}}}) print({{{{}}}}) ~~~ More markdown
I place the cursor on line 3 and execute
:move 1
. The line will lose its Lua highlighting, but it will retain its rainbow highlighting. However, if I place the cursor on line 4 and execute:move 5
(i.e. downwards) it will lose the rainbow highlighting.
I can reproduce this if I copy exactly your short markdown file there, but it is not a problem in the stress markdown file, so I am not sure what causes this. I will try to look at the changes returned some more.
I think I figured out the problem. It seems to be an off by one error from what I can tell.
Changing
vim.api.nvim_buf_clear_namespace(bufnr, nsid, change[1], change[3])
to
vim.api.nvim_buf_clear_namespace(bufnr, nsid, change[1], change[3] + 1)
fixes the problem.
But this then breaks html
highlighting. I originally had the lower one, but it gives this highlighting:
instead of:
It seems to be a problem with clearing out the old highlights caused by different behavior in the different parsers. I am not yet sure though, so I will try to look into this a bit more.
This is a temporary(?) fix that seems to work in almost all cases I have checked. The only way I have managed to break this is by having nested language injections (i.e., a language injection in a language injection), but that seems like a rare need. In those rare cases, I think it makes sense to refresh highlighting (that will fix it) if you move code.
I cleaned up the code a bit and noticed another problem with injections in Rust. When you use macros in Rust, it treats that as a language injection with the language being Rust, but when we inject the language itself, it will already be taken care of by the part of the code without a parent language, so in that case we should do nothing. This has been messing with Rust highlighting of (and around) macros for me for a long time, and the current fix I made finally gets around it.
I have tested out the code some more and haven't noticed any problems now except when you have nested language injections, which I think is pretty rare. In those cases updating highlighting or moving stuff can break the highlighting with the current code. E.g.:
Markdown text.
~~~lua
print 'This is injected Lua in Markdown (one level deep)'
vim.cmd[[
echo str([])
echo 'This is injected Vim in Lua in Markdown (two levels deep)'
]]
~~~
You have the patience of a saint. It will be a day or two before I get around to looking at the code, so I need to ask for a bit more patience.
Instead of ignore<anything>
for the captures that we ignore for both debug and highlighting, do you think it would make more sense to use _<anything>
? I.e., just ignore stuff starting with underscore. I feel like that might be a bit more expected behavior.
Yes, underscore is better. It is more in line with ignored variables in programming languages and harder to mistype.
In https://github.com/HiPhish/rainbow-delimiters.nvim/pull/43/commits/d4cc189f80dedf327aa2308033823cb515a830d1 I included some comments that I will delete soon, but I made a commit to look back on if needed.
I played around with using vim.api.nvim_get_extmarks
and vim.api.nvim_del_extmark
instead of vim.api.nvim_buf_clear_namespace
since those allow us to work with exact rows and columns instead of just the rows, but I didn't manage to improve the behavior over what we have now (there was still a problem with nested injections), so I didn't switch to those functions in the code.
Also: I think on_changedtree
is supposed to work as we want without the need for the changes = {{tree:root():range()}}
hack, so we might want to report that as a bug upstream. (The video example with an injected C block above is where you can see the bug.)
I just reordered some code to remove one level of if-nesting. Can you please double-check that the conditions are still the same?
I just reordered some code to remove one level of if-nesting. Can you please double-check that the conditions are still the same?
The logic looks correct to me. (Also, it behaves as expected in quick testing of what that code is supposed to solve.)
Where did you get the _parent_lang
field of the parser from? It is not documented, and when I inspect a parser object there is no such field either. I used the following Markdown text:
Hello
```c
puts((((("This is an injected language")))));
{
{
{
{
{
return ((((((((((((2)))))))))))) + ((((((3))))));
}
}
}
}
}
```
Then I set p = vim.treesitter.get_parser()
and executed lua =p
to view the fields of the parser object. Each parser has a children
field which maps contained languages to their respective parsers. But there is no _parent_lang
field.
EDIT: I don't see any way of finding out whether the parser has a parent language. We can descend deeper into the tree, but not back up.
Where did you get the
_parent_lang
field of the parser from? It is not documented, and when I inspect a parser object there is no such field either. I used the following Markdown text:Hello ```c puts((((("This is an injected language"))))); { { { { { return ((((((((((((2)))))))))))) + ((((((3)))))); } } } } } ```
Then I set
p = vim.treesitter.get_parser()
and executedlua =p
to view the fields of the parser object. Each parser has achildren
field which maps contained languages to their respective parsers. But there is no_parent_lang
field.EDIT: I don't see any way of finding out whether the parser has a parent language. We can descend deeper into the tree, but not back up.
I found _parent_lang
by printing p
in on_changedtree
and then going through :messages
. It seems to be nil
(so not shown) when there is no parent language, but it should show up for all injections.
If we want to avoid using _parent_lang
and still have movement of text work with injections, I think we need to figure out what causes the wrong changes
to be reported in certain cases (like the above c code). If that is fixed, then we could simplify the code a bit and even have it work with nested injections.
UPDATE: I just checked, and this seems to be only available on HEAD and not in 0.9.4.
Since p._parent_lang
seems to be a nightly feature, I have updated util.for_each_child
so that we can keep track of the parent language ourselves instead. Everything seems to work as expected in the global strategy now (I haven't tested the corresponding changes in the local strategy yet).
Please let me know if there are any problems with this update.
I think I have fixed the problem with nested injections now, but the solution is a bit hacky. I don't think it should affect the performance of the highlighting though, since the main part of the updated code will only be run when we are at least two injections deep.
Please let me know if you notice any problems.
Note: If we don't want to include this last change, the previous code before this change worked for everything other than nested injections in my testing.
I don't think we need the function for_each_child_dif_lang
if we move the test into the thunk. I have done so in the previous commit, please check if the logic is the same.
This is a draft of a fix to the problem where rainbow-delimiters doesn't catch the same group multiple times. E.g., it couldn't catch
elseif
andelse
in Lua before.I have time to edit and add improvements if you have any suggestions, or we could add it as another strategy if you don't want it to replace the old
global
?I have only tested it in Lua and Rust for now (and before getting rainbow blocks to work better than before in other languages we need to edit the corresponding .scm files). But it works exactly as I would expect so far.
To use this it is important that each
@container
only has at most one@closing
, but you can have as many@opening
or@intermediate
as you want.