Open GrayJack opened 4 years ago
@sogaiu Do you know about any other solutions?
I think you may be correct in your assessment.
I think in the tree-sitter-clojure grammar I wanted strings to be tokens in order to get the string delimiters (double-quotes) to match, but that seems to be at odds with what you have in mind.
For reference, in rouge, one can use backreferences (and other things), so this kind of thing is doable: https://github.com/rouge-ruby/rouge/blob/a9fdd441b72d12463ed2383531c82a2f87ea18e7/lib/rouge/lexers/janet.rb#L142
rule %r/@?(`+).*?\1/m, Str::Heredoc
If tree-sitter supported backreferences, then may be string wouldn't have to be a token (assuming one doesn't use an external scanner). I don't think tree-sitter supports backreferences though (I'm not sure though). It seems hard to tell what's supposed to work: https://github.com/tree-sitter/tree-sitter/issues/463
I think we talked about using parser.c to achieve somewhat similar ends before. Specifically, we talked about this file: https://github.com/sogaiu/tree-sitter-clojure/blob/319eae813b29621cdbefe9750d558e4b1634f7b2/src/scanner.cc -- I think you mentioned having written something in C.
Does that sound familiar? I don't remember very clearly :)
I remember trying to do escape code, there is even a
escape_code
token that are never used, I remember having problem with strings that contains#
inside and the parser identifies that as line comment when doing.Possible solutions
tree-sitter-rust uses a parser.c to handle the string contents that aren't escape code. If I'm not mistaken, it is for the same problem as this one