Indenting gets confused with multibyte chars

eraserhd / parinfer-rust

A Rust port of parinfer.

ISC License

538 stars 41 forks source link

Indenting gets confused with multibyte chars #26

Open hukka opened 6 years ago

hukka commented 6 years ago

If there are some chars that take multiple bytes in UTF-8, parinfer-rust refuses to let them be in the same visual indent level, instead requiring as much indentation spaces as there are bytes before the correct level in the previous line:

(def äää {:foo 1
             :bar 2})

(def aaa {:foo 1
          :bar 2})

(def äää {:foo 1}
          :bar 2)

hukka commented 6 years ago

I suppose there's no way to do it with the standard library and instead something like https://crates.io/crates/unicode-segmentation is needed to do the "iteration over grapheme clusters", as the docs put it.

hukka commented 6 years ago

Or perhaps http://unicode-rs.github.io/unicode-width/unicode_width/index.html is better. I'm way over my knowledge here. I can see the problem with some European languages, but I have no idea how easily this could be solved "generally". Or if it's even possible, given current terminals, fonts and OS font rendering.

FWIW my specific problem would probably go away even by counting code points, which — I realize — is a horrible, horrible hack to do Unicode "right".