Open whatevermarch opened 5 years ago
Paths certainly CAN contain non-ASCII characters.
On Windows that's pretty common. Even copying a directory via Explorer often adds a " - copy" suffix and the actual ' - ' is a non-ascii long-dash.
On unix/linux, wchar_t is usually a 4 byte UTF32 character, and char is UTF8 character. Casting it like this has a huge chance of destroying the string contents for any non-ASCII characters in the path and producing some trash in the std::string.
On windows, wchar_t is a 2 byte UTF16LE character, and char is your-windows-version-specific-codepage ASCII-like non-UTF8 character see here and dig further. Casting it like this has a huge chance of destroying the string contents and producing some trash in the std::string.
IMHO, that needs fixing.
Of course, you may decide to not support Unicode in paths and limit paths to the always-safe lower 0..127 part of ASCII, but that should be very explicitely stated in docs and tutorials. It may be so. I didn't check. If it is, then fine.
Hi, I have tried compiling on Windows 10 with VS 2019. There is a warning exposed during the compilation in Anvil project ( in VS solution ), and this is the details
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: 'argument': conversion from 'wchar_t' to 'const _Elem', possible loss of data
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: with
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: [
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: _Elem=char
8>$(VS_PATH)\include\xstring(2308,23): warning C4244: ] (compiling source file $(ANVIL_ROOT)\src\misc\io.cpp)
I have investigated it and found that the problem line is 168 ( in master branch at this time I post ), which is
std::string file_name (file_name_with_path_wide.begin(), file_name_with_path_wide.end() );
This produces an error since the compiler is set to threat warning as an error, but it is fine because I can disable it in the project settings. However, this becomes my concern whether the file name data will be interpreted wrong or not according to this weird conversion.
wchat_t
stores 4 bytes character, while normalchar
only stores one byte. Using this constructor is like copying all the data to the new one and reinterpreting onewchat_t
element as 4char
elements.Maybe I misunderstood the conversion, or it might be the safe conversion as expected, but at least please fix the warning message if you plan to still enable -D_CRT_SECURE_NO_WARNINGS to MSVC compiler.
Thank you in advance.