Open deadalnix opened 4 years ago
Copy/paste from html page is not reliable. Have you tried the same with powershell? Running standalone, without ConEmu?
First thing I noted, is that in your third example (☠) there is 00a0
(line feed).
First and second has not line feed character.
So I'm not sure at all where is the problem.
I need some more precise and reproducible tests.
The native console doesn't not display the chars properly, but it does copy/paste properly.
Considering we are communicating over web - and yes, that is unreliable - what do you suggest we use for me to be able to provide you reproducible steps?
PS: thanks for looking into this.
I did some investigation and for me it looks like a bug in Windows console (conhost).
Simple test attached, run it from console via pwsh.exe -command print-unicode.ps1
.
And output looks like
If I try to paste the glyph "🏃"
into native console prompt there is even more mess
Windows 10 1909 (10.0.18363.836)
I consider this is a bug of Windows which ConEmu can't mitigate itself.
sample C++ test, only WriteConsoleOutput function works properly
#include <windows.h>
int main()
{
const auto hOut = GetStdHandle(STD_OUTPUT_HANDLE);
const wchar_t writeConsole[] = L"WriteConsole: --\xD83C\xDFC3--\n";
const wchar_t writeConsoleChars[] = L"WriteConsoleCharacters: --\xD83C\xDFC3--";
const wchar_t writeConsoleBuffer[] = L"WriteConsoleBuffer: --\xD83C\xDFC3--";
CONSOLE_SCREEN_BUFFER_INFO si = {};
DWORD written;
WriteConsoleW(hOut, writeConsole, wcslen(writeConsole), &written, nullptr);
GetConsoleScreenBufferInfo(hOut, &si);
WriteConsoleOutputCharacterW(hOut, writeConsoleChars, wcslen(writeConsoleChars), si.dwCursorPosition, &written);
++si.dwCursorPosition.Y;
SetConsoleCursorPosition(hOut, si.dwCursorPosition);
CHAR_INFO bufferData[80] = {};
for (size_t i = 0; writeConsoleBuffer[i]; ++i)
{
bufferData[i].Char.UnicodeChar = writeConsoleBuffer[i];
bufferData[i].Attributes = 7;
}
const COORD bufSize = {wcslen(writeConsoleBuffer), 1};
const COORD bufCoord = {};
SMALL_RECT writeCoors = {si.dwCursorPosition.X, si.dwCursorPosition.Y, si.dwCursorPosition.X + bufSize.X - 1, si.dwCursorPosition.Y};
WriteConsoleOutputW(hOut, bufferData, bufSize, bufCoord, &writeCoors);
++si.dwCursorPosition.Y;
SetConsoleCursorPosition(hOut, si.dwCursorPosition);
return 0;
}
output
WriteConsole: --�--
WriteConsoleCharacters: --�--
WriteConsoleBuffer: --🏃--
PS. In theory, the problem could be mitigated after switching to PTY API.
@miniksa, @zadjii-msft could you please check the problem from your side?
Thanks for the investigation!
I had to modify my workflow on my hand to work around that problem. It's not ideal, but it's livable. Is there a way to switch to the PTY API on my end? Or is it something that would require important refactoring on ConEmu's end?
PS: while the native console display garbage too on my end, I can copy from the console and paste somewhere else, and it paste the right stuff. Not sure by what magic this happens, but that would be a great usability plus for me if that would work.
Versions
ConEmu build: 200604 x64 OS version: Windows Windows 10 19041 x64 Used shell version (Far Manager, git-bash, cmd, powershell, cygwin, whatever): cmd
Problem description
When using unicode character that are 4 bytes in size, ConEmu seems to corrupt the output in some way. Not only the character isn't displayed properly (which I don't really care about, tbh) but copying anything that contains such character result in corrupted data in the clipboard.
Interestingly, open the "real" terminal, via ctrl+win+alt+space show that it is also unable to display the character, but copying from it get the right data in the clipboard.
This seems to indicate that the problem isn't actually displaying the character in the case of ConEmu, but something deeper going on.
Steps to reproduce
Some bash utilities are handy to demonstrate the problem, so I will use wsl's bash, but the problem exist for he regular command line, powershell or anything.
Now, if we copy the command in the clipboard and paste in anywhere, we can see that the unicode character was corrupted like this:
printf � | hexdump
While the character are not displayed properly, this works as expected on the "real" console.
Pasting back into the shell, we get the following output:
This happens with any 4 byte character, shorter characters seems to be working just fine.
For instance:
Copying and pasting works properly, and the character is also displayed properly.
Actual results
4 bytes unicode character are corrupted before being displayed, which prevents proper display as well as copying and pasting.
Expected results
4 bytes unicode char should not be corrupted and copy/paste should just work(tm).
Bonus point if they can be displayed properly, but this is an entirely different problem and may very well just work as soon as the right bytes are pushed onto the buffer on screen.
Additional files
To make sure nothing interfere, I made a new ConEmu install and left the config by default.