Open Nelson-numerical-software opened 7 years ago
1) Why do you talk about the WriteConsoleW? Have you checked the result in the RealConsole by Ctrl-Win-Alt-Space
?
2) Please run from ConEmu's prompt ConEmuC -checkunicode
and show result here.
1] It seems that it is also a bug of Windows 10 insiders 14959, 14965 With a Windows 10 stable version 1607 and same version of ConEmu 161023, it works .
2] Please notice duplicated characters 中中文文
ConEmuC -checkunicode ConEmu 161022 x86 OS Version: 10.0.14965 (2:) SM_IMMENABLED=1, SM_DBCSENABLED=0, ACP=1252, OEMCP=850 ConHWND=0x00090634, Class="ConsoleWindowClass" Console font info: 0, {3x5}, 54, 400, "Lucida Console" Handles: In=x8 (Mode=x1F7) Out=xC (x3) Err=x10 (x3) Buffer={131,1000} Window={0,0}-{130,35} MaxSize={131,166} Cursor: Pos={0,9} Size=25% Visible ConsoleCP=850, ConsoleOutputCP=850 CP850: Max=1 Def=x3F,x00 UDef=x3F Lead=x00,x00,x00,x00,x00,x00,x00,x00,x00,x00,x00,x00 Name="850 (OEM - latin multilingue I)"
123456789也也不不是是可可运运行行的的程程序序112233445566778899 Normal Reverse x7 x4007 Normal:x7 Reverse:x4007
Check AÀÀΑΑ╬╬豈豈AAꊠꊠ黠黠だだ➀ጀะڰЯ09 Text: AÀÀΑΑ╬╬豈豈AAꊠꊠ黠黠だだ➀ጀะڰЯ09 Read: A:x7 ÀÀ:x107 ΑΑ:x207 ╬╬:x107 豈豈:x207 AA:x107 ꊠꊠ:x207 黠黠:x107 だだ:x207 ➀:x107 ጀ:x207 ะ:x107 ڰ:x207 Я:x107 0:x207 9:x107 Blck: A:x7 ÀÀ:x107 ÀÀ:x207 ΑΑ:x107 ΑΑ:x207 ╬╬:x107 ╬╬:x207 豈豈:x107 豈豈:x207 AA:x107 AA:x207 ꊠꊠ:x107 ꊠꊠ:x207 黠黠:x107 黠黠:x207 だだ:x107 Info: 0,1,1,16,1,1,24,1
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╦╦══ ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗║ 中中文文 ║中中 文文║╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ ╩╩════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ ══╝ Unicode check succeeded
@miniksa Can you take a look at this? Reported already several times here.
@Maximus5 I've filed it as MSFT:9751066 internally and assigned to myself. I'm currently in a deep thought on something else, so I'll probably get to it early next week. Thanks for the report.
I see the issue. There appear to be duplicates coming out of ReadConsoleOutputW/A. I'm not sure what happened there. I'll have to keep investigating, but it looks like it will need a fix on our side once I figure it out.
Perhaps this comes from changes in attributes processing. I noted some time ago (not sure where exactly) that new Windows build process high byte of console attributes "in proper and better way"... One of the most weird things in conhost is COMMON_LVB_LEADING_BYTE/COMMON_LVB_TRAILING_BYTE processing. It works differently on DBCS (Chenese/Japanese/...) Windows distros than on "European" distros. On DBCS systems, when certain CJK codepages are selected, each double-width glyph takes two (or more?) CHAR_INFOs (cells). That never happened on European distros, even if CJK support was installed and these codepages were selected in the console. I can't reproduce this issue on my test Win 10 boxes yet.
FYI, I haven't forgotten about this investigation. We've just suddenly got slammed with e-mails and bugs from all sources and so getting to investigating this may take me significantly longer than I originally predicted. I will be back when I get a chance.
FYI, the fix for this should have just landed with Insider Build 15014 today.
Just tested Build 15014. Not fixed yet.
Hmmm. Not sure what's up. I'll dig into character handling stuff today.
@miniksa Finally I managed to install insider build.
First, the expected behavior from "stable" Win10 build. All glyphs are written and displayed properly, no doubled CJK and data properly fit on screen.
I'm still checking the results, here first notes.
SM_DBCSENABLED
is 0, COMMON_LVB_LEADING_BYTE
and COMMON_LVB_TRAILING_BYTE
are set. Is that intended on non-DBCS enabled OS? There were not used previously, only CJK versions of Windows (up to Win 10 14393) used them.Finally. Here are drawing bugs during selection in conhost's window. I selected one by one cells with mouse. Cells have unexpected width during selection. And strangely the line below the selection is broken during selection.
@miniksa Inconsistency of API... WriteConsoleOutputAttribute, WriteConsoleOutputCharacter, ReadConsoleOutputCharacter, ReadConsoleOutputAttribute, ReadConsoleOutput...
Some of functions treat CJK as normal single-cell glyphs (WriteConsoleOutputCharacter, ReadConsoleOutputCharacter).
Some of functions return COMMON_LVB_LEADING_BYTE
/COMMON_LVB_TRAILING_BYTE
and therefore double cells (ReadConsoleOutputAttribute, ReadConsoleOutput).
Some of functions has undefined behavior (after WriteConsoleOutputAttribute and further WriteConsoleOutputCharacter glyphs are "written" after filled with attributes cells).
It's all on non-CJK insider Win 10.
Yeah, I was finding bad behavior like this yesterday as well. Part of the deal is that it behaves differently with Raster Fonts vs. TrueType fonts as well. I'll probably be spending the rest of the week on trying to fix this up and make it consistent. I don't know what SM_DBCSENABLED is/does. Console's DBCS check has always been based on the active code page (is equal to 932, 949, 950, 936) not that system metric.
I'll try to keep you posted as I figure this out. Sorry about that. A few of us have been working on trying to fit UTF-8 support into the console (not done yet) and it appears to have messed up quite a few DBCS routes.
I used to check GetSystemMetrics(SM_DBCSENABLED) which actually was 1 only for Windows installations developed for China, Japan, Korea (CJK). If SM_DBCSENABLED returns 0 that meant that CJK glyphs use only one cell in conhost, regardless of the codepage. That was true before. Now it is broken or changed. What is correct behavior?
I'll have to get back to you on that. Everything you are telling me about SM_DBCSENABLED is 100% new information to me. I don't really know if that particular metric used to be a part of the console code in XP/Vista/7/8. I can look. I also don't know what in the system turns that metric on or off.
From what I know about the console from Win 8.1 to today, the console always did its conversions and width calculations based on code page. It's just that prior to recently, it used to prohibit changing into a CJK codepage unless your system's non-Unicode region was set to a CJK language (Control Panel-->Region-->Administrative-->Language for non-Unicode programs). I've been trying to remove that restriction to allow anyone to swap into any codepage no matter their "non-Unicode region" because in today's editions of Windows (as opposed to the CJK-specific ones of the 1990s), you can add just about any language pack and IME and font to any language edition of Windows, so the "non-Unicode" region doesn't really matter like it used to several decades ago.
My plan is:
So I've got through 1, 2, and 3 in MSFT: 10187355 which is checked in and will start shipping up to Insiders builds. Probably be there in a few weeks. I've basically restored the console's behavior to the same as what it was for the legacy console. If it works against the console with the legacy box checked, it will work again against the updated one once the Insider build updates.
For part 4, I'm still working on it. I basically need to write up the way that the v1/legacy console did it and publish that.
@miniksa @Maximus5 FWIW, this VSCode/winpty issue seems related: https://github.com/Microsoft/vscode/issues/19665. ConEmu is broken in exactly the same way (screenshot in this comment, https://github.com/Microsoft/vscode/issues/19665#issuecomment-287248500). I wrote a test case demonstrating the new (broken?) behavior as of Win10 v15048.
hi I have no such problem in previous windows build (15063.413) for simplified Chinese. I only noticed such issue after latest stable windows build 15063.447 rolled out: alpha build works almost fine with new console.
stable and preview build works find with legacy console
I try Chinese on the UTF8 version of Newlisp. https://github.com/kosh04/newlisp/blob/develop/nl-utf8.c It works well.
https://stackoverflow.com/questions/3911536/utf-8-unicode-whats-with-0xc0-and-0x80 (I hope it could help.)
Versions
ConEmu build: 161023 x64 stable OS version: Windows 10 x64 (1607) Microsoft Windows [version 10.0.14959] cmd
Problem description
WriteConsoleW duplicates chinese characters
Steps to reproduce
Actual results
Output: Traditional Chinese 漢漢字字
Expected results
Original string: Traditional Chinese 漢字
Additional files
build this code with VS 2015 C++:
include
include
int main() { std::wstring msg = L"Traditional Chinese 漢字"; HANDLE consoleHandle = GetStdHandle(STD_OUTPUT_HANDLE); WriteConsoleW(consoleHandle, msg.c_str(), msg.size(), NULL, NULL); return 0; }