Open kanashimia opened 7 months ago
I also just ran into this in Helix. As a work around in Helix, you can set the language to HTML, but it would be nice to fix this grammar.
It seems adding any <? ... ?>
construct that isn't the xml one at the beginning of the document breaks parsing.
here's the problem: https://github.com/RenjiSann/tree-sitter-xml/blob/main/grammar.js#L79
the spec uses -
to represent taking the complement between the two sets. In this case, I think it means any string of length 0 or greater except for any strings of length 0 or greater that contain the substring ?>
. However, in the current grammar, it's parsing the literal character -
. I can't think of a way to translate the spec into a regex that doesn't use lookahead. I'm guessing the best bet is to use a heuristic like [^?]*
to match the processing instruction content.
The issue was in Helix, they use this grammar as I see.