Closed ruuda closed 2 years ago
Isn't it better to have the names in ASCII? There's probably some built-in function to do that
Isn't it better to have the names in ASCII
There can be non-ascii text that I would consider to be fine, e.g. “Saint-Étienne”, “Münster”, “O’Connor”. I haven’t found those among current Solana validators though.
There's probably some built-in function to do that
This is a very hard problem and depends on what you want to do and assumptions about the source language and script too ... (E.g. do you convert ü to ue, ß to ss? Or ü to just u? Or do you drop it entirely? What about code points that have no obvious ascii alternative?) But there is https://lib.rs/crates/deunicode which does an impressive job nonetheless.
This is a very hard problem and depends on what you want to do and assumptions about the source language and script too ... (E.g. do you convert ü to ue, ß to ss? Or ü to just u? Or do you drop it entirely? What about code points that have no obvious ascii alternative?) But there is https://lib.rs/crates/deunicode which does an impressive job nonetheless.
That's fair, probably is better to do what's in this PR to depend on this crate 👍
One of the validators in the last onboarding wave has U+26A1 HIGH VOLTAGE SIGN and U+FE0F VARIATION SELECTOR-16 in its name. In my terminal and also in Grafana in my browser, these render as a lightning bolt using an emoji font.
Also, due to the way we render labels when rendering the metrics, the variation selector turns into the literal text "\u{fe0f}" (so a backslash and hex between curly brackets, not an U+FE0F itself).
Strip code points from both blocks to fix this.