atom / language-clojure

Clojure package for Atom
Other
49 stars 48 forks source link

Improve handling of non-ASCII letters #82

Closed PEZ closed 5 years ago

PEZ commented 5 years ago

Description of the Change

Clojure allows for unicode characters in keywords, symbols, etcetera. This PR replaces matching of ASCII only letters to corresponding unicode matching. Making for example this code snippet tokenize and colorize correctly:

(defn ^:kräsen äppelmust [äpplen]
  (when (:Åkerö äpplen)
    (-> äpplen
      (pressa)
      (häll-på-flaska)
      (sätt-på-etikett))))

Generally the changes are applied to regexp character classes like so:

Alternate Designs

Concerning tests:

I considered adding separate tests for this, but decided against it as it is really a hygien factor for the grammar to match Clojure's reader as good as possible.

Another way it could be tested is to just change all foo and bar and stuff to things like Öπ, but it would make the tests generally a bit less clear, I think.

So Instead, in places where I couldn't just add non-ASCII strings to a list of test strings, I added asserts and attached comments about the non-ASCII nature of them.

Benefits

For people using their non-english, native languages for naming stuff in their code, or people who just use a wider naming space than ASCII, this will make the syntax highlighting work much better.

Possible Drawbacks

I don't see how it could hurt anywhere.

Applicable Issues

Fixes #57

PEZ commented 5 years ago

Will someone look at this?

rsese commented 5 years ago

Thanks @PEZ - someone from the team will take a look as soon as they can.

PEZ commented 5 years ago

Thanks. The process gets a bit extra slow for me, since I want this into VS Code, and they instruct me to first add it here, because their syntax tokenizer is built from this one.

nathansobo commented 5 years ago

Thanks very much.