Detanup01 / gbe_fork

Fork of https://gitlab.com/Mr_Goldberg/goldberg_emulator
https://gitlab.com/Mr_Goldberg/goldberg_emulator
GNU Lesser General Public License v3.0
86 stars 36 forks source link

upgrade to c++20/23 #33

Closed otavepto closed 2 weeks ago

otavepto commented 3 weeks ago

not because it is needed, but because C++17 is getting older by now

I would suggest not merging this immediately, you have to test paths handling, I can't test every scenario unfortunately. if this was ever merged, please always encourage opening files via one of these:

common_helpers::filesystem_str() isn't related to files, but does the appropriate conversion to the useless new types std::u8string/std::u8string_view/char8_t* for later usage by std::filesystem::path when needed.

common_helpers::u8str_to_str() isn't also related to files, but changes the same useless new types back to std::string

no need to build deps again

Detanup01 commented 3 weeks ago

I actually tried with c++latest (i think it's c++23?) but sadly not exist in Linux yet

Detanup01 commented 3 weeks ago

If you could say what scenario you need to test I will try to test it

otavepto commented 3 weeks ago

that would be testing every place where the emu is opening a file/folder against some path with unicode characters, for example:

this is really tedious, or impossible.

in any case I did a global project search for all the places mentioning std::ifstream, std::ofstream, std::fstream, std::filesystem and replaced all usages with the ones from the helpers, still testing can find weird scenarios.

the problem with utf-8 paths is now split between Windows and C++20, previously with C++17, everything wasn't capable of handling utf-8 (on Windows only) except one constructor std::filesystem::u8path so C++17 introduced at least a cross platform solution.

but since this is deprecated now, the regular constructor std::filesystem::path does the same job under one condition, that is the given string uses the new char8_t type which is not supported anywhere in the standard library, not even with std::cout.

here's how it is explained in the docs for constructor (6):

(6). Any of the character types char, char8_t, (since C++20)char16_t, char32_t, wchar_t is allowed, and the method of conversion to the native character set depends on the character type used by source

so now the type of the character dictates how the conversion takes places, and here's what they have to say about the regular char type:

If the source character type is char, the encoding of the source is assumed to be the native narrow encoding (so no conversion takes place on POSIX systems).

that means, on Windows, if char is used then no conversion takes places and the chars are assumed to be narrow (ascii for Windows), which won't work with non-ascii paths. indeed when I tried it in a separate project with this path 命定奇谭ğğğğğÜÜÜÜ it failed to find the folder which I already created in the current active dir.

auto u8s = std::string("命定奇谭ğğğğğÜÜÜÜ");
std::filesystem::path u8sp = std::filesystem::path(u8s);

std::cout << std::filesystem::current_path() << std::endl;
std::cout << std::filesystem::is_directory(u8sp) << std::endl; // prints 0
std::cout << std::filesystem::exists(u8sp) << std::endl; // prints 0

and here's what they have to say about the new char8_t type:

If the source character type is char8_t, conversion from UTF-8 to native filesystem encoding is used.

equivalent to calling MultiByteToWideChar() just changing std::string to std::u8string indeed fixed it

auto u8s = std::u8string(u8"命定奇谭ğğğğğÜÜÜÜ"); // changed
std::filesystem::path u8sp = std::filesystem::path(u8s);

std::cout << std::filesystem::current_path() << std::endl;
std::cout << std::filesystem::is_directory(u8sp) << std::endl; // now prints 1
std::cout << std::filesystem::exists(u8sp) << std::endl; // now prints 1

for the other 2 types char16_t and char32_t

If the source character type is char16_t, conversion from UTF-16 to native filesystem encoding is used. If the source character type is char32_t, conversion from UTF-32 to native filesystem encoding is used.

on Windows char16_t is similar to wchar_t so no conversion takes place, and char32_t has to be converted to std::wstring on Linux (again except good old char) I assume all are converted to utf-8.

There is a write-up on github explaining how to change some setting in the registry to fix everything 4.How to get the main arguments UTF-8 encoded but it requires a manifest file and Windows 10 >= Version 1903

after reading it I realized that generate_interfaces.exe was broken all this time! (even the one in the original project on gitlab), if the path contained for example 命定奇谭ğğğğğÜÜÜÜ then argv (even before we get a chance to open the path) is simply corrupted, and the chars are replaced with question marks ?. either the main has to be changed to use wmain (Microsoft specific) or the manifest mentioned above has to be added :/ so nothing cross platform is available, at least without some hacks I assume the entry main vs wmain could be done, separating the entry point from the entire application but honestly not worth it for a simple console app.

otavepto commented 3 weeks ago

I'll add a premake override to force the builtin C++ dialect "C++latest" to use whatever you want in the future, I'll set it to C++23 as a test, only for GCC/Clang since VS already supports that out of the box

otavepto commented 3 weeks ago

done, output from this x64 job:

g++   -MD -MP -DCURL_STATICLIB ... -m64 -O0 -g -std=c++2b ...

and this x32 job

g++   -MD -MP -DCURL_STATICLIB ... -m32 -O0 -g -std=c++2b ...

in the future you can change the lang version flag https://github.com/Detanup01/gbe_fork/blob/046da8d929082221c3941dee2faf879b538863db/premake5.lua#L66-L68

otavepto commented 2 weeks ago

I sent you the diff/patch if you want to add it later, will close this for now