GrayJack / tree-sitter-janet

A tree-sitter grammar parser for Janet
BSD 3-Clause "New" or "Revised" License
16 stars 3 forks source link

Escape codes are not handled as it's own token #4

Open GrayJack opened 4 years ago

GrayJack commented 4 years ago

I remember trying to do escape code, there is even a escape_code token that are never used, I remember having problem with strings that contains # inside and the parser identifies that as line comment when doing.

Possible solutions

tree-sitter-rust uses a parser.c to handle the string contents that aren't escape code. If I'm not mistaken, it is for the same problem as this one

GrayJack commented 4 years ago

@sogaiu Do you know about any other solutions?

sogaiu commented 4 years ago

I think you may be correct in your assessment.

I think in the tree-sitter-clojure grammar I wanted strings to be tokens in order to get the string delimiters (double-quotes) to match, but that seems to be at odds with what you have in mind.

For reference, in rouge, one can use backreferences (and other things), so this kind of thing is doable: https://github.com/rouge-ruby/rouge/blob/a9fdd441b72d12463ed2383531c82a2f87ea18e7/lib/rouge/lexers/janet.rb#L142

rule %r/@?(`+).*?\1/m, Str::Heredoc

If tree-sitter supported backreferences, then may be string wouldn't have to be a token (assuming one doesn't use an external scanner). I don't think tree-sitter supports backreferences though (I'm not sure though). It seems hard to tell what's supposed to work: https://github.com/tree-sitter/tree-sitter/issues/463

I think we talked about using parser.c to achieve somewhat similar ends before. Specifically, we talked about this file: https://github.com/sogaiu/tree-sitter-clojure/blob/319eae813b29621cdbefe9750d558e4b1634f7b2/src/scanner.cc -- I think you mentioned having written something in C.

Does that sound familiar? I don't remember very clearly :)