Open MaximSokolov opened 8 years ago
I was abusing Cyrillic to cheat a reserved identifier, and I noticed the same thing. Note how const
highlighting continues after the bogus e
:
const nеw = isNew ? "new " : "";
I'm probably the one who should address this, which I'll do once:
My problem with this is that I don't want to overcomplicate the grammar. If [\w$]
supports all of those though, then I am +100 since all that would require is adding error highlighting for function names that begin with a number.
Surprise: Oniguruma has a Unicode-aware \w
character class, but GitHub's PCRE doesn't (since they're running it in ASCI mode for performance reasons).
While I can't see this causing breakage, I'd prefer this grammar's highlighting remain consistent wherever it's used...
On an interesting side-note, CoffeeScript's interpretation of valid identifiers differs to JavaScript's. I have the following snippets in my snippets.cson
file, completely unquoted:
"Symbol Snippets":
€: {prefix: "C=", body: "€"}
″: {prefix: ",,", body: "″"}
™: {prefix: "TM", body: "™"}
©: {prefix: "(C)", body: "©"}
©2: {prefix: "(c)", body: "©"}
®: {prefix: "(R)", body: "®"}
®2: {prefix: "(r)", body: "®"}
×: {prefix: "x", body: "×"}
→: {prefix: "->", body: "→"}
←: {prefix: "<-", body: "←"}
⇒: {prefix: "=>", body: "⇒"}
⇐: {prefix: "<=", body: "⇐"}
The keys don't receive highlighting, but CoffeeScript allows them anyway. Unfortunately, it neglects to quote them for JS.... at least on their site's REPL. The output on the right breaks if parsed as JavaScript:
Just a reminder to avoid CoffeeScript whenever possible. =)
There is already support in TextMate, particularly in language-babel I've tested this regex and it seems to be working fine (see at Lightshow):
[$_\\p{L}\\p{Nl}][$\\p{L}\\p{Nl}\\p{Mn}\\p{Mc}\\p{Nd}\\p{Pc}\\x{200C}\\x{200D}]*
\p{L}
matches any kind of letter from any language\p{Nl}
matches a number that looks like a letter, such as a Roman numeral\p{Mn}
matches a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.)\p{Mc}
matches a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages)\p{Nd}
matches a digit zero through nine in any script except ideographic scripts\p{Pc}
matches a punctuation character such as an underscore that connects words\x{200C}
zero width non-joiner\x{200D}
zero width joinerRefs: JavaScript variable name validator Unicode Character Categories What characters are valid for JavaScript variable names? [Stack Overflow]