Closed otavepto closed 2 weeks ago
I actually tried with c++latest (i think it's c++23?) but sadly not exist in Linux yet
If you could say what scenario you need to test I will try to test it
that would be testing every place where the emu is opening a file/folder against some path with unicode characters, for example:
this is really tedious, or impossible.
in any case I did a global project search for all the places mentioning std::ifstream
, std::ofstream
, std::fstream
, std::filesystem
and replaced all usages with the ones from the helpers, still testing can find weird scenarios.
the problem with utf-8 paths is now split between Windows and C++20, previously with C++17, everything wasn't capable of handling utf-8 (on Windows only) except one constructor std::filesystem::u8path
so C++17 introduced at least a cross platform solution.
but since this is deprecated now, the regular constructor std::filesystem::path
does the same job under one condition, that is the given string uses the new char8_t
type which is not supported anywhere in the standard library, not even with std::cout
.
here's how it is explained in the docs for constructor (6):
(6). Any of the character types char, char8_t, (since C++20)char16_t, char32_t, wchar_t is allowed, and the method of conversion to the native character set depends on the character type used by source
so now the type of the character dictates how the conversion takes places, and here's what they have to say about the regular char
type:
If the source character type is char, the encoding of the source is assumed to be the native narrow encoding (so no conversion takes place on POSIX systems).
that means, on Windows, if char
is used then no conversion takes places and the chars are assumed to be narrow (ascii for Windows), which won't work with non-ascii paths.
indeed when I tried it in a separate project with this path 命定奇谭ğğğğğÜÜÜÜ
it failed to find the folder which I already created in the current active dir.
auto u8s = std::string("命定奇谭ğğğğğÜÜÜÜ");
std::filesystem::path u8sp = std::filesystem::path(u8s);
std::cout << std::filesystem::current_path() << std::endl;
std::cout << std::filesystem::is_directory(u8sp) << std::endl; // prints 0
std::cout << std::filesystem::exists(u8sp) << std::endl; // prints 0
and here's what they have to say about the new char8_t
type:
If the source character type is char8_t, conversion from UTF-8 to native filesystem encoding is used.
equivalent to calling MultiByteToWideChar()
just changing std::string
to std::u8string
indeed fixed it
auto u8s = std::u8string(u8"命定奇谭ğğğğğÜÜÜÜ"); // changed
std::filesystem::path u8sp = std::filesystem::path(u8s);
std::cout << std::filesystem::current_path() << std::endl;
std::cout << std::filesystem::is_directory(u8sp) << std::endl; // now prints 1
std::cout << std::filesystem::exists(u8sp) << std::endl; // now prints 1
for the other 2 types char16_t
and char32_t
If the source character type is char16_t, conversion from UTF-16 to native filesystem encoding is used. If the source character type is char32_t, conversion from UTF-32 to native filesystem encoding is used.
on Windows char16_t
is similar to wchar_t
so no conversion takes place, and char32_t
has to be converted to std::wstring
on Linux (again except good old char
) I assume all are converted to utf-8.
There is a write-up on github explaining how to change some setting in the registry to fix everything 4.How to get the main arguments UTF-8 encoded but it requires a manifest file and Windows 10 >= Version 1903
after reading it I realized that generate_interfaces.exe
was broken all this time! (even the one in the original project on gitlab), if the path contained for example 命定奇谭ğğğğğÜÜÜÜ
then argv
(even before we get a chance to open the path) is simply corrupted, and the chars are replaced with question marks ?
.
either the main has to be changed to use wmain (Microsoft specific) or the manifest mentioned above has to be added :/ so nothing cross platform is available, at least without some hacks
I assume the entry main
vs wmain
could be done, separating the entry point from the entire application but honestly not worth it for a simple console app.
I'll add a premake override to force the builtin C++ dialect "C++latest" to use whatever you want in the future, I'll set it to C++23 as a test, only for GCC/Clang since VS already supports that out of the box
done, output from this x64 job:
g++ -MD -MP -DCURL_STATICLIB ... -m64 -O0 -g -std=c++2b ...
and this x32 job
g++ -MD -MP -DCURL_STATICLIB ... -m32 -O0 -g -std=c++2b ...
in the future you can change the lang version flag https://github.com/Detanup01/gbe_fork/blob/046da8d929082221c3941dee2faf879b538863db/premake5.lua#L66-L68
I sent you the diff/patch if you want to add it later, will close this for now
not because it is needed, but because
C++17
is getting older by nowC++23
char8_t
type__cplusplus
version setting on Visual Studioutfcpp
libutfcpp
lib in many projectsC++20
templated conceptsI would suggest not merging this immediately, you have to test paths handling, I can't test every scenario unfortunately. if this was ever merged, please always encourage opening files via one of these:
common_helpers::open_fread()
->std::ifstream
common_helpers::open_fwrite()
->std::ofstream
common_helpers::open_fstream()
->std::fstream
common_helpers::std_fs_path()
->std::filesystem::path
common_helpers::filesystem_str()
isn't related to files, but does the appropriate conversion to the useless new typesstd::u8string
/std::u8string_view
/char8_t*
for later usage bystd::filesystem::path
when needed.common_helpers::u8str_to_str()
isn't also related to files, but changes the same useless new types back tostd::string
no need to build deps again