Open AndyScull opened 8 years ago
I need exact characters which overlaps.
From what I know, japanese letters and general symbols they use start from 0x3000. I used http://unicode-table.com/en/ to try few different ranges and see what results I'll get. Keep in mind I kept using MS Gothic font, so many characters were shown as squares (but behavior of those squares were different) 0180 range - Latin Extended - do not have problems whatsoever 0370 - Greek characters - no problems Arabic-like languages skipped, since they're right-to-left and symbols appear at start of filename despite being typed in the end 0e00 - Thai - no problems 1800 - Mongolian - no problem, except font drawing changes for whole filename, if there are visible Mongolian characters in line. If characters are cut off (left panel made narrower) - font returns to normal 1E00 - Latin Extended Additional - My font cannot show them, but squares do not jump around as I change panel width, and I believe they would show fine if I used compatible font 2c60 - Latin Extended-C - same as above 2E80 - CJK radicals - the problem I mentioned appears.
Basically, to replicate this you can:
@AndyScull What is your OS? Show info from ConEmu/About/OS.
here you go:
ConEmu 160612 [32] Startup Info
OsVer: 6.1.7601.x64, Product: 1, SP: 1.0, Suite: 0x100, SM_SERVERR2: 0
CSDVersion: Service Pack 1, ReactOS: 0 (), Rsrv: 0
DBCS: 0, WINE: 0, PE: 0, Remote: 1, ACP: 1251, OEMCP: 866, Admin: 0
AppID: 41ff8172ae65e0896451e011f983072f::161
Desktop: Winsta0\Default
, SessionId: 1, ConsoleSessionId: 4
Title: D:\Shell\Far\ConEmu.exe
Size: {0,0},{0,0}
Flags: 0x00000001, ShowWindow: 1, ConHWnd: 0x00000000
char: 1, short: 2, int: 4, long: 4, u64: 8
Handles: 0x00000000, 0x00000000, 0x00000000
Current PID: 15364, TID: 26648
Active HKL: 0x04090409
GetKeyboardLayoutList: 0x04090409 0x04190419
DBCS: 0, ACP: 1251, OEMCP: 866
And how do you imagine ConEmu would fit your string in non-intended console space without shrinking???
You have two options: either install CJK OS or do not use CJK.
Oh... then I have a counterquestion - why does the same happens on complete japanese windows installation?
ConEmu 160612 [32] Startup Info
OsVer: 6.1.7601.x32, Product: 1, SP: 1.0, Suite: 0x100, SM_SERVERR2: 0
CSDVersion: Service Pack 1, ReactOS: 0 (), Rsrv: 30
DBCS: 1, WINE: 0, PE: 0, Remote: 0, ACP: 932, OEMCP: 932, Admin: 0
AppID: 478cb7f6f85177052b35c65a51839a7a::161
Desktop: Winsta0\Default
, SessionId: 1, ConsoleSessionId: 1
Title: Z:\Far\ConEmu.exe
Size: {0,0},{0,0}
Flags: 0x00000001, ShowWindow: 1, ConHWnd: 0x00000000
char: 1, short: 2, int: 4, long: 4, u64: 8
Handles: 0x00000000, 0x00000000, 0x00000000
Current PID: 4200, TID: 4204
Active HKL: 0x04090409
GetKeyboardLayoutList: 0x04090411 0x04090409 0x04110411 0x04190419 0x08040804
p.s.
And how do you imagine ConEmu would fit your string in non-intended console space without shrinking???
The same way ConEmu v 141221 did. That's why I wrote that expected behavior from updating the program is to not lose current functionality
why does the same happens on complete japanese windows installation?
The same? I doubt. Show screenshots of ConEmu and RealConsole on DBCS system. And issue chcp in that console, what codepage it shows?
The same way ConEmu v 141221 did.
Previously, ConEmu trims glyphs which overrun intended rectangle. Just compare screenshots. I'm sure partially displayed text is worse than compressed one.
OK, did it
CHCP:
How it looks in plain Far:
ConEmu (2016):
And this specific long string I mentioned before:
Realconsole (cursor on long filename):
This looks like a bug of Far Manager.
On DBCS enabled systems, CJK takes two cells instead of one. AFAIK Far ignores this fact and it's an issue for Mantis.
Why your long string is displayed condensed in ConEmu I'm not sure. Depends on exact glyphs location in console. Seems like Far tries to fit data which exceeds panels size. And ConEmu do its best to fit bad data...
Oh well ok, will stick to old version then. Even if Far team admits and fixes it, that will leave me with VERY ugly view with a lot of spaces when I'd run Far without ConEmu (I don't always run it, only when I need to work with japanese files and see their names) Thanks for help
I can't understand why do you prefer cropped content.
Elaborate please, what exactly do you mead by cropped content? If it's about my choice to stick with old version - that's because in new version I get garbled and unreadable 'content' which is no content at all. Old version has it's disadvantages (mainly cutoff on the right in dialog boxes for very long strings) but at least my files are shown correctly and I see nice charmy words, like if I'd copied them in word. With new version I'd have to try hard to read any of long filenames, or if I used monowidth, I'd have a ugly spaced ascii. IMO, old version totally wins this comparison Just to note, I don't need and don't use any of ConEmu's fancy features except main ability to show unicode filenames - without switching locale, mucking with system fonts and et cetera. I'd use Total Commander, as it does this even better, but it sucks when working with command line
OK, more reasons and description
In your own example, you have really long text in your console
ConEmu can't do any magic. It shows in virtual console the data console application printed. Lets take WinWord for example. Would you like if the text you typed goes out of page margins? What happens if you send this doc to printer? You would have only half of the book, which has absolutely no sense. You can't guess what was in the cropped text (which overruns page margins).
What would happen, when you try to copy this cropped string (old ConEmu behavior) from console? Doesn't matter, with Far's grabber Alt+Ins
or ConEmu internals Copy. The behaviour would be weird. You try to copy something which does not exists on screen.
Well, some fun below:
With old behavior, when overruns were just dropped (cropped) you got that
And you just didn't see infomation which may be valuable
On DBCS enabled OS properly designed console application knows, that CJK takes two cells and does not try to print more data than possible. Far was not designed for CJK, thats why I've suggested you to complain on Mantis.
To prove my point, single screenshot that demonstrates 1) how text would be cropped if FAR used 2 cells for DBCS chars. Upper filename is 31 dbcs chars long (to count, I used 60-digit file). This is how FAR would print it if the 'bug' was fixed. Lower filename - 62 dbcs chars. It goes beyond end of tab. Now, tell me, which line of information should be tagged as 'cropped'? Especially consider that for 31-char line, you don't even know that it is cropped. Filename may be actually 60 characters long but FAR would print only 31 of them (one per 2 cell, right?), and ConEmu would nicely fits each of those 31 characters to the left (unless you enable monowidth which looks ugly) 2) How file information is not overlayed by text in my case. I see all size/date tabs correctly. Must be some different font settings in your case, or maybe different version of conemu? I couldn't replicate same behavior in my installation.
//edit
Lets take WinWord for example
Bad comparison. WinWord crops not by number of characters, by but line glyph width (cm/pixels). That's exactly how old conemu works, and actually you're rooting for my team here
1) You are wrong. If the bug would be fixed in Far, it would print the following
║世界中のあらゆる情報を検索するためのツールを提供しています。さまざまな検索機能 を活用}Folder ║
Than you would be able to scroll long names as usual in Far with Alt-Left
/Alt-Right
. For example
{界中のあらゆる情報を検索するためのツールを提供しています。さまざまな検索機能 を活用し}Folder ║
That behaviour of console application would be absolute proper and ConEmu would not crop/drop/whatever any parts of text. And there would be no overlaps too.
Now, tell me, which line of information should be tagged as 'cropped'?
You can easily see that part on the left. You have no idea at all that after し
there is long string て、お探しの情報を見つけてください
. This data was cropped/dropped/hidden/...
2) I show you part of the status bar. There are no vertical bars, text is printed as one continous line. It's not correct to point on the one example, without mentioning a lot of other variants.
Look. I do not tell you that current implementation is ideal. ConEmu just does its best to show what console application asks to show.
You are wrong. If the bug would be fixed in Far, it would print the following
Erm. Then you probably lost me. I'll try to explain how I think, and you please correct me where I am wrong
Actually, this little screenshot shows how it prints DBCS chars on native jap windows, so I believe I made no mistakes in my logical chain There's 2 numbers per each japanese character, so 'fixed' FAR on jap system would show half as much of these characters as it shows now, limiting string length to 30 chars. And if I wouldn't use monowidth font in ConEmu, all glyphs would get packed tightly to the left, leaving unused space to the end of tab (unless I use monolength font, which probably would look very ugly with non-japanese filenames)
//edit After few more tests, I can't definitely say how current FAR works on native DBSC locales. Without ConEmu, it mostly shows 1 char per 2 cells, but sometimes it starts clamping them together in every cell... I won't post this bug to FAR forum because I have no guarantee that fixed FAR would work like it works now on non-DBCS Windows. That's how I use it now and I'm pretty happy with my experience. Japanese characters are shown correctly without using Total Commander, and without changing whole system locale. Upgrading to latest ConEmu version was just a experiment to see what was fixed and improved. Unfortunately, it didn't improve anything for me so I am staying on old version. Probably forever
DBCS versions of Windows works absolutely different than non-DBCS.
When you run application on DBCS Windows and use double (four) byte codepage (like 932) each CJK takes real two cells. Take a look at COMMON_LVB_LEADING_BYTE
and COMMON_LVB_TRAILING_BYTE
in CHAR_INFO description. This is absolutely weird and unbelievable on first sight, but actually this is the only way to print and display CJK using [A] console functions. Moreover, even if console application uses [W] function (like Far does) to write wchar_t sequences, the console doubles each CJK (first will have COMMON_LVB_LEADING_BYTE
and second - COMMON_LVB_TRAILING_BYTE
flag) and you have this glyph in TWO cells, otherwise [A] functions will fail to read 932 codepage!
The only exception is codepage 65001. It uses one unicode (wchar_t) real cell.
The console window (conhost.exe) and ConEmu known about that and display sequence of cells (COMMON_LVB_LEADING_BYTE
... COMMON_LVB_TRAILING_BYTE
) as one CJK glyph.
So, when you are using CJK Windows, console applications must work in different way than on non-CJK Windows.
If so, it can fit width/2 characters for any given range of cells on virtual console screen. And there will be spaces, like when I enable monowidth in ConEmu
So, you are wrong here. There would be no spaces. Only CJK glyphs which are (unfortunately) doubled in conhost internal buffer (taking two cells) but are displayed as one wide glyph.
Simple test in CPP coming
I'm not into programming, much more in CPP. I'd like to hear specific answers - if FAR was fixed and DBCS-aware, how many characters would be shown in 60-cell wide tab in DBCS, and non-DBCS windows. And how would they look mixed with ascii characters? And how many characters would be seen in ConEmu window? And consider it all with non-monowidth font... And then compare with how many characters I see now in old ConEmu- it could be defined by phrase 'however many glyphs fit into tab space'
So, you are wrong here. There would be no spaces. Only CJK glyphs which are (unfortunately) doubled in conhost internal buffer (taking two cells) but are displayed as one wide glyph.
Well, I mean spaces between glyphs. Console should use monowidth font for those, so there should be more spacing between actual character lines when compared to pure graphic output
I'm preparing tests and screenshots. Later today...
Console should use monowidth font for those,
What do you mean? On DBCS system "monowidth" font has different width for double-width (CJK, full) and single-width (just ASCII) characters.
PS. Is it possible to discuss in Russian to avoid translation problems?
yep. По английски было бы более доступно другим юзерам, если когда-нибудь кто-нибудь заморочился как я. Под monowidth в консоли я имею в виду - как courier шрифты, каждый глиф чара дополняется пустым местом до определенной и одинаковой для всего шрифта ширины. В DBCS это либо ширина ячейки, либо она же умноженная на два. Возможно, сам шрифт не моноширинный, а это консольные проги рисуют символ в середине 1- или 2- ширинного пространства. В результате между самими символами остается достаточно много пустого пространства, что получается шире, чем если б то же самое набили в ворде немоноширинным шрифтом. Текст-то в общем понятен, но на экран влезает меньше символов.
Here are some tests: https://github.com/Maximus5/Write-Read-Test
how many characters would be shown in 60-cell wide tab in DBCS, and non-DBCS windows.
RealConsole (conhost.exe) physically can't show more than 30 full-width glyphs in 60-cells console. They are folded to the next line otherwise. Each CJK takes two cells in any case on DBCS OS.
And how would they look mixed with ascii characters?
Exactly as they must. ASCII (half-width) would take single cell.
And how many characters would be seen in ConEmu window?
Same as in RealConsole. There would be no compression at all, because all glyphs would take desired space. On DBCS OS of course.
And consider it all with non-monowidth font...
Arial? Times New Roman? Tahoma? Awful... regardless CJK or not.
Arial? Times New Roman? Tahoma? Awful... regardless CJK or not.
And here's the answer. It would definitely look awful for me if this bug in FAR was fixed. That's why I won't report it and prefer you'd not do it too. I can live with older ConEmu version, but new versions of FAR are a must, they fix and add a lot of things. If at some point they'd 'fix' it, I'd either have much shorter CJK strings (compared to before), or would stick with outdated FAR version. So just forget all this issue please. Anyway, I may be forced to switch to linux in 5 years...
Thanks for an option! Only now I noticed it and at last updated my conemu version
There's still some minor difference in character output from old version though - for some fonts, characters are not centered in their cell, like there's very little space on left and a lot on right. Since Monowidth doesnt do anything for them, I assume it has to do something with font itself (happens with 'MS UI Gothic' but not 'MS Gothic'). It's not something that really needs fixing since I'm alright with changing font, but who knows, maybe it can be fixed with a single line of code...
This is how text looks in old version of conemu (size 20 MS UI Gothic, cell 0, I was using it since I found the combination):
This is new version, same settings (I copied conemu.xml and disabled 'compress long strings' option): リ character is obviously moved a little to the left, it is clearly seen if I edit filename and select this character to see where it's glyph ends This is the same string with cell=12: Readable but not very pretty : ) could be mistaken for space character So I tried other fonts and surprisingly it displays correctly with MS Gothic:
I noticed it seems to be happening only to fonts which aren't monowidth. Here we have MS Gothic japanese characters aligned as 2 ascii cells: and selection to show actual placeholder:
And here;s MS UI Gothic: and selection:
That just feels like an error somewhere in text handling code, so I am going to close this issue (since everything else works as intended) and if you want you can look into this further at your own pace
If you think output nay be improved (looks like so), reopen the issue and put here the file/text where problem occurs.
Well, if you have the time to fix it :) I can live with current situation though
Examples of broken text (all with font MS UI Gothic, size=20, width=0, cell=0, all checkboxes are unchecked): 淫行 - complex kanji often overlap, I'll give one example but almost all of them have wrong width: Easier to see it if you select one character in text, selection immediately crops that character
Space is too thin (and is not affected when I enable monospace and cell=12): cell=0, monospace disabled: This is with cell=12: String is the same as in first example, I just quickly typed space between characters to make screenshots
リ - From what I can find, same problem (too little space on left and too much on right) with ッ, ク , タ, イ, ド, し No problem with ム, ー, い, ん, ス, chars. Maybe they're too wide to have this problem or it somehow depends on unicode number
CJK exclamation mark !(U+ff01): Though even in my current browser it isn't centered in it's placeholder
@AndyScull Chinese character hava the same problem. Is there any solution now?
Sadly, no, I just use old version of conemu, from 2014. I don't use newer features, all I need is correct display of unicode characters and it does it. The exact version is 141221 [32bit]
@AndyScull Thank you very much, your answer is very helpful to me.
I don't think the issue is actual in current ConEmu builds. Option "Compress long strings to fit space" exists for a long time. There is no sense in using old builds
@Maximus5 Unfortunately,Option "Compress long strings to fit space" don't work in current ConEmu builds, this is my screenshot.
There is no sense in using old builds
I respectfully disagree with that. This is from old version of conemu, pure cmd output: And this is from latest version: "Compress long strings to fit space" is irrelevant since strings in my output aren't that long to be affected by it
Isn't it better to ping the issue?
@TGhoul So, do you prefer to lost completely the tail of the string in favor of CJK not clamping together?
Versions
ConEmu build: 160612 x32 (stable) OS version: Windows 7 x64 Used shell version: Far Manager
Problem description
Problem with japanese characters overlapping one another. See attached screenshots from 141221, it does not have this problem. ConEmu settings for text are same - same checkboxes, fonts, font sizes. Switching to Monolength isn't acceptable since it makes filenames with ascii characters look really bad
Steps to reproduce
Actual results
-->screenshots
Expected results
not break existing functionality
Additional files
ConEmu 2016: some filenames do not keep original character widths (which is probably defined in ttf font?) ConEmu 2014: each character has it's own width and displayed correctly. There are some characters with more space that visible glyph is, but at least they aren't drawn over each another