Closed myd7349 closed 1 year ago
Another problem that the CJK locale may have is placement of string specifiers. This means that "Clipboard couldn't %s %s" requires the action to come before the item that Clipboard used. If CJK requires anything different, then it is impossible with printf
. However, I can switch the message format over to fmt
because this allows you to move around the strings.
By setting /utf-8
is not enough. To make Win32 APIs(FormatMessageA
, for example) and MSVC CRT functions(std::exception.what()
, for example) handle UTF-8 encoding correctly, we should also attach a manifest to clipboard.exe
.
As we can see in the above snapshot, error messages returned by FormatMessageA
and local file paths that contain CJK characters still can not display correctly.
To fix this, one more step is necessary: Use UTF-8 code pages in Windows apps
First, save the follow XML snippet as UTF-8.manifest
, together with clipboard.exe
:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
<assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
<application>
<windowsSettings>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
</assembly>
Then, open Developer Command Prompt for VS2022
, and execute
mt.exe -manifest UTF-8.manifest -outputresource:clipboard.exe;#1
Now the new generated clipboard.exe
can handle UTF-8 correctly:
Describe the bug
Hi! I am trying to build Clipboard from source under Win11 with VS2022 Preview Community Edition. Since my local language is Chinese Simplified, the default code page of my machine is CP936.
1 Try:
Output:
In clipboard.cpp and messages.cpp, there are many non-ASCII characters stored in message string literals(std::string_view). When it comes to CJK locales, MSVC will try to parse these string literals with the system's current code page(instead of UTF-8), resulting in these compilation errors.
At first, I wanted to change the type of these strings to std::u8string_view, but I gave up due to the amount of work(COLs) involved.
2 Try:
Patch src/clipboard/CMakeLists.txt:
In this way, I want the compiler to treat the two source files containing the UNICODE message strings as UTF-8 encoded.
This strategy worked, although it elicited a bunch of warning messages.
Build log:
However, the problem is that the path containing the CJK character and the localized error message returned by FormatMessageA are not displayed correctly when the program is run.
3 Try:
Patch CMakeLists.txt:
This time, in addition to specifying the
/source-charset:utf-8
, I also specified the/execution-charset:utf-8
.https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170
Build log:
No errors, no warnings.
This time, characters like
╳
and▏
can show correctly.To Reproduce
Expected behavior
Screenshots
Additional context
Before you post Please make sure you check previous bug reports before filing a new one. This will help keep the Issues section less cluttered. :)