GPUOpen-Archive / Anvil

Anvil is a cross-platform framework for Vulkan
MIT License
594 stars 62 forks source link

Concerning wstring-to-string conversion in WIN32 implementation #151

Open whatevermarch opened 5 years ago

whatevermarch commented 5 years ago

Hi, I have tried compiling on Windows 10 with VS 2019. There is a warning exposed during the compilation in Anvil project ( in VS solution ), and this is the details

8>$(VS_PATH)\include\xstring(2308,23): warning C4244: 'argument': conversion from 'wchar_t' to 'const _Elem', possible loss of data 8>$(VS_PATH)\include\xstring(2308,23): warning C4244: with 8>$(VS_PATH)\include\xstring(2308,23): warning C4244: [ 8>$(VS_PATH)\include\xstring(2308,23): warning C4244: _Elem=char 8>$(VS_PATH)\include\xstring(2308,23): warning C4244: ] (compiling source file $(ANVIL_ROOT)\src\misc\io.cpp)

I have investigated it and found that the problem line is 168 ( in master branch at this time I post ), which is

std::string file_name (file_name_with_path_wide.begin(), file_name_with_path_wide.end() );

This produces an error since the compiler is set to threat warning as an error, but it is fine because I can disable it in the project settings. However, this becomes my concern whether the file name data will be interpreted wrong or not according to this weird conversion. wchat_t stores 4 bytes character, while normal char only stores one byte. Using this constructor is like copying all the data to the new one and reinterpreting one wchat_t element as 4 char elements.

Maybe I misunderstood the conversion, or it might be the safe conversion as expected, but at least please fix the warning message if you plan to still enable -D_CRT_SECURE_NO_WARNINGS to MSVC compiler.

Thank you in advance.

quetzalcoatl commented 4 years ago

Paths certainly CAN contain non-ASCII characters.

On Windows that's pretty common. Even copying a directory via Explorer often adds a " - copy" suffix and the actual ' - ' is a non-ascii long-dash.

On unix/linux, wchar_t is usually a 4 byte UTF32 character, and char is UTF8 character. Casting it like this has a huge chance of destroying the string contents for any non-ASCII characters in the path and producing some trash in the std::string.

On windows, wchar_t is a 2 byte UTF16LE character, and char is your-windows-version-specific-codepage ASCII-like non-UTF8 character see here and dig further. Casting it like this has a huge chance of destroying the string contents and producing some trash in the std::string.

IMHO, that needs fixing.

Of course, you may decide to not support Unicode in paths and limit paths to the always-safe lower 0..127 part of ASCII, but that should be very explicitely stated in docs and tutorials. It may be so. I didn't check. If it is, then fine.