Closed the-mikedavis closed 3 years ago
Hey, thanks for the detailed report! I was playing with this for a bit, and managed to narrow it down. In the end it's not even related to the use of external scanner, but just the hidden node. I will submit an issue to tree-sitter with minimal reproduction shortly :)
Reported in https://github.com/tree-sitter/tree-sitter/issues/1441! I actually found a reasonable solution, so will go ahead and use it :)
@the-mikedavis it should works as expected now, let me know if you encounter any issues :)
Ah you're awesome, great find!
I just merged those changes in to the helix PR and it works!
Beautiful! :cat:
Hi again :wave:
So I'm pretty sure this is a bug in the tree-sitter query mechanism but I can't get the
not in
binary operator to match a query.I see in docs (which are lovely by the way :) that
not in
is parsed with the external scanner, so my guess is that something in the lexer (either inscanner.cc
or something in tree-sitter) is not getting the full information it needs to understand thenot in
token (byte/codepoint starts/stops maybe?). I also suspect it might be possible to fix this with extra rules in thegrammar.js
.Comparing the query results from a standard binary_operator like
in
withnot in
we see:(The
in.exs
can be replaced by other binary_operators such as++
to the same effect.)Looking at the parse trees, there's some peculiar behavior where
not in
doesn't show up but other binary operators do:And I think it's odd that
not in
is not there between the<identifier>
s :thinking:. That code in thetree-sitter
CLI I think is this block:Which is why I suspect there might be some missing information about the start and stop bytes of the
$._not_in
rule. (Although as we'll see below, the parser does seem to know the start/stop bytes when$._not_in
is changed to be a non-hidden rule.)a possible but not-great workaround
One workaround which allows that
query.scm
to match (and therefore query/highlightnot in
the same as any other binary_operator) is to change thegrammar.js
's rule for$._not_in
to$.not_in
so as to unhide it. Then we see a parse result ofAnd the query matches!
It also works as expected with arbitrary whitespace like
a not in b
.This seems pretty hacky to me though to throw an extra operator node in there just for the sake of making this match the query though.
I haven't dug too deep yet into the tree-sitter codebase yet to try to hunt down exactly what's going on. I thought I'd write out my findings here first in hopes that you might already know why this parses strangely and can't be queried.