39aldo39 / klfc

Keyboard Layout Files Creator
GNU General Public License v3.0
215 stars 13 forks source link

Wrong chevrons when using X Keyboard Extension format. #28

Closed kindaro closed 4 years ago

kindaro commented 4 years ago

There are 2 pairs of chevrons in Unicode: U+27E8 U+27E9 and U+2329 U+232A. The latter are deprecated and have wrong width. (Scroll down to the end of the section.)

When I put the good chevrons in my configuration, wrong chevrons are actually bound by the generated X Keyboard Extension files.

My source looks like this:

    ...
    { "pos": "8", "letters": [ "-", "8", "−", "\u27E8" ] },
    { "pos": "9", "letters": [ "/", "9", "÷", "⟩" ] },
    ...

The generated symbols file looks like this:

    ...
    key <AE08> { [        minus,            8,        U2212, leftanglebracket ] };
    key <AE09> { [        slash,            9,     division, rightanglebracket ] };
    ...

Why leftanglebracket and rightanglebracket denotations are expanded to the obsolete pair of chevrons is a question in itself, but I have no idea where to post that issue. A solution that works for now is to denote the desired symbols by their Unicode numbers, like this:

    ...
    key <AE08> { [        minus,            8,        U2212, U27E8 ] };
    key <AE09> { [        slash,            9,     division, U27E9 ] };
    ...

Every time I regenerate the files, I must patch them again, so this is not a long term solution.

How should we approach this problem?

  1. I propose that we find the upstream of the symbolic ...anglebracket denotations and ask them to put forward an update. By chance you have a suggestion who that might be? The xkbcommon people?
  2. In the meanwhile, a temporary fix could be put in place. We may emit the Unicode numbers for the chevrons instead of the symbolic denotation.
  3. Possibly we could give the user the power to decide whether to prefer symbolic or numeric denotations? Although I am not sure how that may look ideally, but as a first approximation, a switch to emit numeric denotations exclusively may be good. Actually, is there any reason to emit symbolic denotations, beside human readability?
DreymaR commented 4 years ago

I feel that when \u#### is used in the source, the XKB files should always use U#### notation. That way, the user can preserve a code point exactly as it was intended.

The issue that 〈 〉 are expanded to wrong/outdated(?) brackets may not be so simple but it'd indeed be a matter for the XKB people. I had to read up on the matter: • Different fields use different brackets. • The U+2329/232A code points actually decompose to U+3008/3009! • These are the Asian CJK punctuation brackets, so they're commonly used. • Physics happily uses U+3008/3009 in bra-ket notation it seems? • Thus I've used U+2329/232A in my math dead key table. I'll change that. • Maths and physics should rightly use U+27E8/27E9 as you said?! • On the other hand, there are other mathematical signs that are often substituted. • In sum, I'm unsure as to whether there is a simple right answer here.

But by preserving Unicode values, users can get the one they're after.

As far as I can see, there are far more than two sets of brackets in Unicode, all in all: https://en.wiktionary.org/wiki/%E2%9F%A8_%E2%9F%A9 https://en.wiktionary.org/wiki/%E3%80%88_%E3%80%89

39aldo39 commented 4 years ago

It turns out that I was wrong in outputting leftanglebracket. In the documentation of keysymdef.h it says that for some characters (like this one) the corresponding Unicode point is vague and shouldn't be used, but I did. I updated the parsing, so it should be fixed in the next version.

I do not think it is needed to always explicitly output U+XXXX, as it is harder to read. In this case it was also not needed to differentiate, as the output was wrong.