Closed GoogleCodeExporter closed 8 years ago
The PuTTY manual has this to say about this setting:
"There are some Unicode characters whose width is not well-defined. In most
contexts,
such characters should be treated as single-width for the purposes of wrapping
and so
on; however, in some CJK contexts, they are better treated as double-width for
historical reasons, and some server-side applications may expect them to be
displayed
as such. Setting this option will cause PuTTY to take the double-width
interpretation.
If you use legacy CJK applications, and you find your lines are
wrapping in the wrong places, or you are having other display
problems, you might want to play with this setting."
I'm not keen on inflicting such an option on users. If I understand this
correctly, a
whole lot of non-legacy apps will break if this is activated? Surely then, the
right
thing to do is to fix or replace the legacy apps, rather than use such an ugly
workaround?
What sort of applications are actually affected by this anyway? Anything that's
part
of the Cygwin distribution? And how do other terminal emulators deal with this
issue?
Original comment by andy.koppe
on 15 Apr 2009 at 3:32
This problem is NOT legacy!!
> What sort of applications are actually affected by this anyway?
ALL the applications that control terminal are.
For example, shells (bash, zsh, tcsh, ...), editors (emacs, vim, ...), viewers
(less,
lv, lynx, w3m, ...), and so on.
> And how do other terminal emulators deal with this issue?
Most terminal emulators that can use Unicode have the CJK Width switch as long
as I
know. (xterm, mlterm, and putty, etc)
> The PuTTY manual has this to say about this setting:
I do not think that the author of PuTTY manual correctly understands the
problem.
It is not a problem this is whether the application is a legacy.
There are a lot of problems:
- The terminal application does not know the size of the character displayed on
the
terminal emulator.
- The width of the character changes by the selected font. For example, in the
"MS
Gothic" font, the width of "Greek Small Letter Alpha" is twice "Laten Small
Letter
A". But, in the "Consolas" font, both characters are the same width.
- The text processing functions (like "wcwidth") cannot handle that the width of
character changes dynamically.
- There is an application that needs information on the width of the character
regardless of the terminal. For example, text formatter.
And, we have neither the standard nor the protocol to solve these problems.
Original comment by deenhe...@gmail.com
on 15 Apr 2009 at 6:41
Admittedly I don't really know what I'm talking about here, but you haven't
convinced
me. Here's xterm's take on this:
-cjk_width
Set the cjkWidth resource to ''true''. When turned on, characters with East Asian
Ambiguous (A) category in UTR 11 have a column width of 2. Otherwise, they have
a
column width of 1. This may be useful for some legacy CJK text terminal-based
programs assuming box drawings and others to have a column width of 2. It also
has to
be turned on when you specify a TrueType CJK double-width (bi-width/monospace)
font
either with -fa at the command line or faceName resource. The default is
''false''
So since a width of 1 is the default in xterm, I assume that all those standard
programs actually work correctly with that width (and a matching font)? And that
there really is a legacy problem with some applications that expect a width of
2? Do
you have any examples of those?
If you set the ambiguous width to 2 for MS Gothic in the terminal, I can't see
how
that's going to work with non-legacy applications, which assume ambiguous
characters
to have width 1? Shouldn't the "MS Gothic" font therefore be considered legacy,
and
the likes of "Consolas" be used instead?
(Also, what does Greek alpha have to do with the East Asian Ambiguous category?)
Original comment by andy.koppe
on 15 Apr 2009 at 8:28
In Japanese,
In the character (sign of the ruled line etc.) that corresponds to ambiguous
width,
all are wide width.
(Traditionally, only ASCII and the one-byte katakana are narrow width. )
Therefore, the application that uses curses is awful.
For instance, like this.
|
|
|
I am making a pertinent part of the locale wide width in Linux.
When this option can be adopted, I am welcome because it has not been equipped
fully
with the locale in cygwin yet.
Original comment by oustt...@gmail.com
on 16 Apr 2009 at 5:28
This is a problem of depending on not only the application but also the font.
I hope the terminal emulator (like MinTTY) draws the character at the position
expected in the selected font.
For example, 00example.txt is UTF-8 text file that assumes the display using a
Japanese font on the terminal emulator.
I expect that it is displayed as 01GOOD.png.
However, current MinTTY displays this as 02BAD.png.
> Shouldn't the "MS Gothic" font therefore be considered legacy, and the likes
of
"Consolas" be used instead?
"MS Gothic" is NOT legacy. It is most popular Japanese font. And ALL fixed pitch
Japanese fonts have the same problem.
> (Also, what does Greek alpha have to do with the East Asian Ambiguous
category?)
In fact, "East Asian Ambiguous category" means the part where the character set
standard of Europe and America and the character set standard of CJK come in
succession. (ASCII is excluded)
Original comment by deenhe...@gmail.com
on 16 Apr 2009 at 3:22
Attachments:
deenheart, thank you very much for that example. I'm thoroughly confused about
this
though, so I'll have to read the Unicode report you linked to properly.
Meanwhile, more questions:
- So there are two sets of ambiguous-width characters: a subset of actual CJK
characters, and also non-CJK non-ASCII characters such as line drawings and
Greek
letters. Am I right to assume that both these sets should always have the same
width?
- Is there any sort of movement towards the one-column characters in Japanese?
(In
other words: why is the two-column option called "legacy" in xterm and PuTTY?)
- How does ncurses deal with these characters?
- Could the correct width be picked by looking at the selected font, without
bothering the user?
- In the attached PNG, "Lucida Console" is used to display the example text.
The CJK
characters take up two character cells, even though the glyphs themselves are
only
one column wide. Is mintty rendering them incorrectly, i.e. should they only
take up
one column?
Original comment by andy.koppe
on 21 Apr 2009 at 5:14
Attachments:
Right, I think I finally get it: the "East Asian Ambiguous" category doesn't
actually
contain any East Asian characters. Instead, it contains characters such as
Greek and
Cyrillic ones that are rendered as halfwidth (i.e. one-column) characters in
non-East
Asian usage, but as fullwidth (i.e. two-column) characters in East Asian usage.
"Legacy" doesn't refer to the ambiguous width issue, but to pre-Unicode
character sets.
Some of the questions remain though:
- How does ncurses deal with these characters?
- Could the correct width be picked by looking at the selected font, without
bothering the user?
- In the attached PNG, "Lucida Console" is used to display the example text.
The CJK
characters take up two character cells, even though the glyphs themselves are
only
one column wide. Is mintty rendering them incorrectly, i.e. should they only
take up
one column?
Original comment by andy.koppe
on 21 Apr 2009 at 8:29
> the "East Asian Ambiguous" category doesn't actually contain any East Asian
characters.
(snip)
> "Legacy" doesn't refer to the ambiguous width issue, but to pre-Unicode
character sets.
Yes, that's right.
> - How does ncurses deal with these characters?
ncurses compiled with '--enable-widec' option depends on wcwidth().
But, Cygwin's wcwidth() is broken. It returns 1 to all characters.
(I am trying to fix it...)
> - Could the correct width be picked by looking at the selected font, without
bothering the user?
In most cases, yes, it can.
However, I uncommonly want to adjust ambiguous character width to 1.
Because, the applications that do not correctly handle the width of the
character is
not a little.
> - In the attached PNG, "Lucida Console" is used to display the example text.
The
CJK characters take up two character cells, even though the glyphs themselves
are
only one column wide. Is mintty rendering them incorrectly, i.e. should they
only
take up one column?
On the terminal emulator, We (= CJK language users) expect that the width of
the CJK
character is a twice the width of the alphabet, and that the aspect ratio of
the CJK
character is 1:1.
In the PNG, I think that the sizes of the CJK characters are too small.
However, because the aspect ratio of "Lucida Console" is not 2:1
(height:width), I
think that it is difficult to display the CJK characters correctly.
Original comment by deenhe...@gmail.com
on 22 Apr 2009 at 1:55
So no one would want CJK characters with width 1, as in the attached cjk1.png?
Actually they seem to be a bit wider than once cell in Lucida Console, as shown
in
the attached lucida.png screenshot from an editor. Do you know why that is,
given
that Lucida Console is meant to be a monospace font? It's the same for Courier
New.
Back to the ambiguous CJK category though. I'll implement an automatic scheme
based
on looking at the width of Greek Alpha in the selected font. If you want to
switch
ambiguous CJK width, you'll need to select an appropriate font. This seems a
better
solution than forcing glyphs with the wrong width into a cell, such as with the
squashed Greek characters from MS Gothic. (If that's not sufficient, I might
consider
a control sequence for overriding the automatic detection.)
Original comment by andy.koppe
on 23 Apr 2009 at 6:48
Attachments:
Implemented font-based handling of ambiguous character width in r240 on trunk.
wintext.c already had a variable called "font_dualwidth", which seems to do the
job.
Original comment by andy.koppe
on 23 Apr 2009 at 9:32
Darn, font_dualwidth is unreliable. This breaks Courier New.
Original comment by andy.koppe
on 23 Apr 2009 at 9:42
Fixed font_dualwidth problem in r241.
Original comment by andy.koppe
on 24 Apr 2009 at 4:45
> So no one would want CJK characters with width 1, as in the attached cjk1.png?
No, it is bad for CJK language user.
> Actually they seem to be a bit wider than once cell in Lucida Console, as
shown in
the attached lucida.png screenshot from an editor. Do you know why that is,
given
that Lucida Console is meant to be a monospace font? It's the same for Courier
New.
The font files for European and American languages (ex. "Lucida Console") don't
include CJK characters. I think that the CJK characters in lucida.png are
displayed
in "MS Gothic". I think that it is a result of "Font Fallback" and/or "Font
Linking".
Please see following page:
Globalization Step-by-Step: Fonts
http://msdn.microsoft.com/en-us/goglobal/bb688134.aspx
The base font (ex. "Lucida Console") is designed as fixed pitch font, and the
substituted font (ex. "MS Gothic") is designed as fixed pitch font. However, The
design of the substituted font is not the same as the design of the base font.
> Implemented font-based handling of ambiguous character width in r240 on trunk.
It looks good. Thank you.
Original comment by deenhe...@gmail.com
on 24 Apr 2009 at 1:55
> The font files for European and American languages (ex. "Lucida Console")
don't
include CJK characters. I think that the CJK characters in lucida.png are
displayed
in "MS Gothic". I think that it is a result of "Font Fallback" and/or "Font
Linking".
I see, that makes sense. I'd just assumed that those fonts have been extended to
cover all (or at least most) of Unicode in this globalised age.
Thanks for all your help and patience with my ignorance!
Original comment by andy.koppe
on 24 Apr 2009 at 5:30
Took fix to 0.3 branch in r250.
Original comment by andy.koppe
on 24 Apr 2009 at 8:57
Original comment by andy.koppe
on 25 Apr 2009 at 1:51
Original issue reported on code.google.com by
deenhe...@gmail.com
on 15 Apr 2009 at 3:01Attachments: