Maximus5 / ConEmu

Customizable Windows terminal with tabs, splits, quake-style, hotkeys and more
https://conemu.github.io/
BSD 3-Clause "New" or "Revised" License
8.64k stars 578 forks source link

Not-locale characters clamping together #739

Open AndyScull opened 8 years ago

AndyScull commented 8 years ago

Versions

ConEmu build: 160612 x32 (stable) OS version: Windows 7 x64 Used shell version: Far Manager

Problem description

Problem with japanese characters overlapping one another. See attached screenshots from 141221, it does not have this problem. ConEmu settings for text are same - same checkboxes, fonts, font sizes. Switching to Monolength isn't acceptable since it makes filenames with ascii characters look really bad

Steps to reproduce

  1. update conemu from 2014's version to latest

    Actual results

-->screenshots

Expected results

not break existing functionality

Additional files

ConEmu 2016: some filenames do not keep original character widths (which is probably defined in ttf font?) conemu_v2016 ConEmu 2014: each character has it's own width and displayed correctly. There are some characters with more space that visible glyph is, but at least they aren't drawn over each another conemu_v2014 settings_comparison

Maximus5 commented 8 years ago

I need exact characters which overlaps.

AndyScull commented 8 years ago

From what I know, japanese letters and general symbols they use start from 0x3000. I used http://unicode-table.com/en/ to try few different ranges and see what results I'll get. Keep in mind I kept using MS Gothic font, so many characters were shown as squares (but behavior of those squares were different) 0180 range - Latin Extended - do not have problems whatsoever 0370 - Greek characters - no problems Arabic-like languages skipped, since they're right-to-left and symbols appear at start of filename despite being typed in the end 0e00 - Thai - no problems 1800 - Mongolian - no problem, except font drawing changes for whole filename, if there are visible Mongolian characters in line. If characters are cut off (left panel made narrower) - font returns to normal 1E00 - Latin Extended Additional - My font cannot show them, but squares do not jump around as I change panel width, and I believe they would show fine if I used compatible font 2c60 - Latin Extended-C - same as above 2E80 - CJK radicals - the problem I mentioned appears.

Basically, to replicate this you can:

  1. set font to something supporting japanese (if you use random font, japanese character would still be shown using substitute font and won't be consistent with ascii characters)
  2. create folder/file named "世界中のあらゆる情報を検索するためのツールを提供しています。さまざまな検索機能 を活用して、お探しの情報を見つけてください" (random string from google)
  3. change Far panel width using alt+left, alt+right and see how file name behaves. If it's shorter than file panel - then everything is alright. When you shrink the panel, letters are being compressed too forcefully and in the end you have to enable monowidth to be able to distinguish characters
Maximus5 commented 8 years ago

@AndyScull What is your OS? Show info from ConEmu/About/OS.

AndyScull commented 8 years ago

here you go: ConEmu 160612 [32] Startup Info OsVer: 6.1.7601.x64, Product: 1, SP: 1.0, Suite: 0x100, SM_SERVERR2: 0 CSDVersion: Service Pack 1, ReactOS: 0 (), Rsrv: 0 DBCS: 0, WINE: 0, PE: 0, Remote: 1, ACP: 1251, OEMCP: 866, Admin: 0 AppID: 41ff8172ae65e0896451e011f983072f::161 Desktop: Winsta0\Default, SessionId: 1, ConsoleSessionId: 4 Title: D:\Shell\Far\ConEmu.exe Size: {0,0},{0,0} Flags: 0x00000001, ShowWindow: 1, ConHWnd: 0x00000000 char: 1, short: 2, int: 4, long: 4, u64: 8 Handles: 0x00000000, 0x00000000, 0x00000000 Current PID: 15364, TID: 26648 Active HKL: 0x04090409 GetKeyboardLayoutList: 0x04090409 0x04190419

Maximus5 commented 8 years ago

DBCS: 0, ACP: 1251, OEMCP: 866

And how do you imagine ConEmu would fit your string in non-intended console space without shrinking???

You have two options: either install CJK OS or do not use CJK.

AndyScull commented 8 years ago

Oh... then I have a counterquestion - why does the same happens on complete japanese windows installation?

ConEmu 160612 [32] Startup Info OsVer: 6.1.7601.x32, Product: 1, SP: 1.0, Suite: 0x100, SM_SERVERR2: 0 CSDVersion: Service Pack 1, ReactOS: 0 (), Rsrv: 30 DBCS: 1, WINE: 0, PE: 0, Remote: 0, ACP: 932, OEMCP: 932, Admin: 0 AppID: 478cb7f6f85177052b35c65a51839a7a::161 Desktop: Winsta0\Default, SessionId: 1, ConsoleSessionId: 1 Title: Z:\Far\ConEmu.exe Size: {0,0},{0,0} Flags: 0x00000001, ShowWindow: 1, ConHWnd: 0x00000000 char: 1, short: 2, int: 4, long: 4, u64: 8 Handles: 0x00000000, 0x00000000, 0x00000000 Current PID: 4200, TID: 4204 Active HKL: 0x04090409 GetKeyboardLayoutList: 0x04090411 0x04090409 0x04110411 0x04190419 0x08040804

AndyScull commented 8 years ago

p.s.

And how do you imagine ConEmu would fit your string in non-intended console space without shrinking???

The same way ConEmu v 141221 did. That's why I wrote that expected behavior from updating the program is to not lose current functionality

Maximus5 commented 8 years ago

why does the same happens on complete japanese windows installation?

The same? I doubt. Show screenshots of ConEmu and RealConsole on DBCS system. And issue chcp in that console, what codepage it shows?

The same way ConEmu v 141221 did.

Previously, ConEmu trims glyphs which overrun intended rectangle. Just compare screenshots. I'm sure partially displayed text is worse than compressed one.

AndyScull commented 8 years ago

OK, did it

CHCP: _chcp

How it looks in plain Far: _plain_far

ConEmu (2016): _settings

And this specific long string I mentioned before: _specific_long_filename

Realconsole (cursor on long filename): _realconsole

Maximus5 commented 8 years ago

This looks like a bug of Far Manager.

On DBCS enabled systems, CJK takes two cells instead of one. AFAIK Far ignores this fact and it's an issue for Mantis.

Why your long string is displayed condensed in ConEmu I'm not sure. Depends on exact glyphs location in console. Seems like Far tries to fit data which exceeds panels size. And ConEmu do its best to fit bad data...

AndyScull commented 8 years ago

Oh well ok, will stick to old version then. Even if Far team admits and fixes it, that will leave me with VERY ugly view with a lot of spaces when I'd run Far without ConEmu (I don't always run it, only when I need to work with japanese files and see their names) Thanks for help

Maximus5 commented 8 years ago

I can't understand why do you prefer cropped content.

AndyScull commented 8 years ago

Elaborate please, what exactly do you mead by cropped content? If it's about my choice to stick with old version - that's because in new version I get garbled and unreadable 'content' which is no content at all. Old version has it's disadvantages (mainly cutoff on the right in dialog boxes for very long strings) but at least my files are shown correctly and I see nice charmy words, like if I'd copied them in word. With new version I'd have to try hard to read any of long filenames, or if I used monowidth, I'd have a ugly spaced ascii. IMO, old version totally wins this comparison Just to note, I don't need and don't use any of ConEmu's fancy features except main ability to show unicode filenames - without switching locale, mucking with system fonts and et cetera. I'd use Total Commander, as it does this even better, but it sucks when working with command line

Maximus5 commented 8 years ago

OK, more reasons and description

In your own example, you have really long text in your console

edited

ConEmu can't do any magic. It shows in virtual console the data console application printed. Lets take WinWord for example. Would you like if the text you typed goes out of page margins? What happens if you send this doc to printer? You would have only half of the book, which has absolutely no sense. You can't guess what was in the cropped text (which overruns page margins).

What would happen, when you try to copy this cropped string (old ConEmu behavior) from console? Doesn't matter, with Far's grabber Alt+Ins or ConEmu internals Copy. The behaviour would be weird. You try to copy something which does not exists on screen.

Well, some fun below:

2016-06-26_14-10-22

With old behavior, when overruns were just dropped (cropped) you got that

2016-06-26_13-59-45

And you just didn't see infomation which may be valuable

2016-06-26_13-59-58

On DBCS enabled OS properly designed console application knows, that CJK takes two cells and does not try to print more data than possible. Far was not designed for CJK, thats why I've suggested you to complain on Mantis.

AndyScull commented 8 years ago

To prove my point, single screenshot that demonstrates image 1) how text would be cropped if FAR used 2 cells for DBCS chars. Upper filename is 31 dbcs chars long (to count, I used 60-digit file). This is how FAR would print it if the 'bug' was fixed. Lower filename - 62 dbcs chars. It goes beyond end of tab. Now, tell me, which line of information should be tagged as 'cropped'? Especially consider that for 31-char line, you don't even know that it is cropped. Filename may be actually 60 characters long but FAR would print only 31 of them (one per 2 cell, right?), and ConEmu would nicely fits each of those 31 characters to the left (unless you enable monowidth which looks ugly) 2) How file information is not overlayed by text in my case. I see all size/date tabs correctly. Must be some different font settings in your case, or maybe different version of conemu? I couldn't replicate same behavior in my installation.

//edit

Lets take WinWord for example

Bad comparison. WinWord crops not by number of characters, by but line glyph width (cm/pixels). That's exactly how old conemu works, and actually you're rooting for my team here

Maximus5 commented 8 years ago

1) You are wrong. If the bug would be fixed in Far, it would print the following

║世界中のあらゆる情報を検索するためのツールを提供しています。さまざまな検索機能 を活用}Folder ║

Than you would be able to scroll long names as usual in Far with Alt-Left/Alt-Right. For example

{界中のあらゆる情報を検索するためのツールを提供しています。さまざまな検索機能 を活用し}Folder ║

That behaviour of console application would be absolute proper and ConEmu would not crop/drop/whatever any parts of text. And there would be no overlaps too.

Now, tell me, which line of information should be tagged as 'cropped'?

You can easily see that part on the left. You have no idea at all that after there is long string て、お探しの情報を見つけてください. This data was cropped/dropped/hidden/...

2) I show you part of the status bar. There are no vertical bars, text is printed as one continous line. It's not correct to point on the one example, without mentioning a lot of other variants.

Look. I do not tell you that current implementation is ideal. ConEmu just does its best to show what console application asks to show.

AndyScull commented 8 years ago

You are wrong. If the bug would be fixed in Far, it would print the following

Erm. Then you probably lost me. I'll try to explain how I think, and you please correct me where I am wrong

  1. Didn't you say that proper width for DBCS characters is 2 cells? You meant character placeholders, like 80x25 in oldschool default console size, or is it something else?
  2. If so, any proper console program should print one DBCS character per 2 cells
  3. If so, it can fit width/2 characters for any given range of cells on virtual console screen. And there will be spaces, like when I enable monowidth in ConEmu
  4. If so, FAR would do the same. Not that characters would be really shown as FAR seems to be missing required fonts on non-japanese versions of windows. Instead, it shows square placeholders.
  5. Now see my previous screenshot and note right part of it with Real Console output. FAR fits all characters it can fit, showing them as squares. IF it was DBCS-aware, it would print one characters per 2 cells, effectively shortening shown filename length to half of tab width
  6. Now, if FAR would print 30 characters of long filename, ConEmu won't go out of it way to find a file, get it's name, and expand it to end of tab, right? ConEmu would just 'convert' existing strings to proper unicode glyphs, using whatever font is specified in settings. in process, shrinking whole line of text as lot of glyphs are less than 2 cells wide
  7. Then you'll get a situation like in my screenshot - where 30 characters are printed aligned to left, and a lot of space because FAR provided only those 30 chars and nothing more

Actually, this little screenshot shows how it prints DBCS chars on native jap windows, so I believe I made no mistakes in my logical chain image There's 2 numbers per each japanese character, so 'fixed' FAR on jap system would show half as much of these characters as it shows now, limiting string length to 30 chars. And if I wouldn't use monowidth font in ConEmu, all glyphs would get packed tightly to the left, leaving unused space to the end of tab (unless I use monolength font, which probably would look very ugly with non-japanese filenames)

//edit After few more tests, I can't definitely say how current FAR works on native DBSC locales. Without ConEmu, it mostly shows 1 char per 2 cells, but sometimes it starts clamping them together in every cell... I won't post this bug to FAR forum because I have no guarantee that fixed FAR would work like it works now on non-DBCS Windows. That's how I use it now and I'm pretty happy with my experience. Japanese characters are shown correctly without using Total Commander, and without changing whole system locale. Upgrading to latest ConEmu version was just a experiment to see what was fixed and improved. Unfortunately, it didn't improve anything for me so I am staying on old version. Probably forever

Maximus5 commented 8 years ago

DBCS versions of Windows works absolutely different than non-DBCS. When you run application on DBCS Windows and use double (four) byte codepage (like 932) each CJK takes real two cells. Take a look at COMMON_LVB_LEADING_BYTE and COMMON_LVB_TRAILING_BYTE in CHAR_INFO description. This is absolutely weird and unbelievable on first sight, but actually this is the only way to print and display CJK using [A] console functions. Moreover, even if console application uses [W] function (like Far does) to write wchar_t sequences, the console doubles each CJK (first will have COMMON_LVB_LEADING_BYTE and second - COMMON_LVB_TRAILING_BYTE flag) and you have this glyph in TWO cells, otherwise [A] functions will fail to read 932 codepage!

The only exception is codepage 65001. It uses one unicode (wchar_t) real cell.

The console window (conhost.exe) and ConEmu known about that and display sequence of cells (COMMON_LVB_LEADING_BYTE ... COMMON_LVB_TRAILING_BYTE) as one CJK glyph.

So, when you are using CJK Windows, console applications must work in different way than on non-CJK Windows.

Maximus5 commented 8 years ago

If so, it can fit width/2 characters for any given range of cells on virtual console screen. And there will be spaces, like when I enable monowidth in ConEmu

So, you are wrong here. There would be no spaces. Only CJK glyphs which are (unfortunately) doubled in conhost internal buffer (taking two cells) but are displayed as one wide glyph.

Maximus5 commented 8 years ago

Simple test in CPP coming

  1. Obtain current cursor position
  2. Write two unicode glyphs L"世 " (CJK + 0x20) using WriteConsoleW
  3. Read three wide chars from cursor pos (from step 1) using ReadConsoleOutputW
  4. Go crazy :(
AndyScull commented 8 years ago

I'm not into programming, much more in CPP. I'd like to hear specific answers - if FAR was fixed and DBCS-aware, how many characters would be shown in 60-cell wide tab in DBCS, and non-DBCS windows. And how would they look mixed with ascii characters? And how many characters would be seen in ConEmu window? And consider it all with non-monowidth font... And then compare with how many characters I see now in old ConEmu- it could be defined by phrase 'however many glyphs fit into tab space'

So, you are wrong here. There would be no spaces. Only CJK glyphs which are (unfortunately) doubled in conhost internal buffer (taking two cells) but are displayed as one wide glyph.

Well, I mean spaces between glyphs. Console should use monowidth font for those, so there should be more spacing between actual character lines when compared to pure graphic output

Maximus5 commented 8 years ago

I'm preparing tests and screenshots. Later today...

Console should use monowidth font for those,

What do you mean? On DBCS system "monowidth" font has different width for double-width (CJK, full) and single-width (just ASCII) characters.

PS. Is it possible to discuss in Russian to avoid translation problems?

AndyScull commented 8 years ago

yep. По английски было бы более доступно другим юзерам, если когда-нибудь кто-нибудь заморочился как я. Под monowidth в консоли я имею в виду - как courier шрифты, каждый глиф чара дополняется пустым местом до определенной и одинаковой для всего шрифта ширины. В DBCS это либо ширина ячейки, либо она же умноженная на два. Возможно, сам шрифт не моноширинный, а это консольные проги рисуют символ в середине 1- или 2- ширинного пространства. В результате между самими символами остается достаточно много пустого пространства, что получается шире, чем если б то же самое набили в ворде немоноширинным шрифтом. Текст-то в общем понятен, но на экран влезает меньше символов.

Maximus5 commented 8 years ago

Here are some tests: https://github.com/Maximus5/Write-Read-Test

how many characters would be shown in 60-cell wide tab in DBCS, and non-DBCS windows.

RealConsole (conhost.exe) physically can't show more than 30 full-width glyphs in 60-cells console. They are folded to the next line otherwise. Each CJK takes two cells in any case on DBCS OS.

And how would they look mixed with ascii characters?

Exactly as they must. ASCII (half-width) would take single cell.

And how many characters would be seen in ConEmu window?

Same as in RealConsole. There would be no compression at all, because all glyphs would take desired space. On DBCS OS of course.

And consider it all with non-monowidth font...

Arial? Times New Roman? Tahoma? Awful... regardless CJK or not.

AndyScull commented 8 years ago

Arial? Times New Roman? Tahoma? Awful... regardless CJK or not.

And here's the answer. It would definitely look awful for me if this bug in FAR was fixed. That's why I won't report it and prefer you'd not do it too. I can live with older ConEmu version, but new versions of FAR are a must, they fix and add a lot of things. If at some point they'd 'fix' it, I'd either have much shorter CJK strings (compared to before), or would stick with outdated FAR version. So just forget all this issue please. Anyway, I may be forced to switch to linux in 5 years...

AndyScull commented 7 years ago

Thanks for an option! Only now I noticed it and at last updated my conemu version

There's still some minor difference in character output from old version though - for some fonts, characters are not centered in their cell, like there's very little space on left and a lot on right. Since Monowidth doesnt do anything for them, I assume it has to do something with font itself (happens with 'MS UI Gothic' but not 'MS Gothic'). It's not something that really needs fixing since I'm alright with changing font, but who knows, maybe it can be fixed with a single line of code...

This is how text looks in old version of conemu (size 20 MS UI Gothic, cell 0, I was using it since I found the combination):msuigothic_oldconemu_cell0

This is new version, same settings (I copied conemu.xml and disabled 'compress long strings' option): msuigothic_cell0 リ character is obviously moved a little to the left, it is clearly seen if I edit filename and select this character to see where it's glyph ends This is the same string with cell=12: msuigothic_cell12 Readable but not very pretty : ) could be mistaken for space character So I tried other fonts and surprisingly it displays correctly with MS Gothic: msgothic_cell0

I noticed it seems to be happening only to fonts which aren't monowidth. Here we have MS Gothic japanese characters aligned as 2 ascii cells: image and selection to show actual placeholder:image

And here;s MS UI Gothic: image and selection:image

That just feels like an error somewhere in text handling code, so I am going to close this issue (since everything else works as intended) and if you want you can look into this further at your own pace

Maximus5 commented 7 years ago

If you think output nay be improved (looks like so), reopen the issue and put here the file/text where problem occurs.

AndyScull commented 7 years ago

Well, if you have the time to fix it :) I can live with current situation though

Examples of broken text (all with font MS UI Gothic, size=20, width=0, cell=0, all checkboxes are unchecked): 淫行 - complex kanji often overlap, I'll give one example but almost all of them have wrong width: image Easier to see it if you select one character in text, selection immediately crops that character

Space is too thin (and is not affected when I enable monospace and cell=12): cell=0, monospace disabled: image This is with cell=12:image String is the same as in first example, I just quickly typed space between characters to make screenshots

リ - image From what I can find, same problem (too little space on left and too much on right) with ッ, ク , タ, イ, ド, し No problem with ム, ー, い, ん, ス, chars. Maybe they're too wide to have this problem or it somehow depends on unicode number

CJK exclamation mark !(U+ff01): image Though even in my current browser it isn't centered in it's placeholder

TGhoul commented 6 years ago

@AndyScull Chinese character hava the same problem. Is there any solution now?

1532072886 1

AndyScull commented 6 years ago

Sadly, no, I just use old version of conemu, from 2014. I don't use newer features, all I need is correct display of unicode characters and it does it. The exact version is 141221 [32bit]

TGhoul commented 6 years ago

@AndyScull Thank you very much, your answer is very helpful to me.

Maximus5 commented 6 years ago

I don't think the issue is actual in current ConEmu builds. Option "Compress long strings to fit space" exists for a long time. There is no sense in using old builds

TGhoul commented 6 years ago

@Maximus5 Unfortunately,Option "Compress long strings to fit space" don't work in current ConEmu builds, this is my screenshot.

1532394967 1

AndyScull commented 6 years ago

There is no sense in using old builds

I respectfully disagree with that. This is from old version of conemu, pure cmd output: image And this is from latest version: image "Compress long strings to fit space" is irrelevant since strings in my output aren't that long to be affected by it

Maximus5 commented 6 years ago

Isn't it better to ping the issue?

Maximus5 commented 6 years ago

@TGhoul So, do you prefer to lost completely the tail of the string in favor of CJK not clamping together?