HowardHinnant / date

A date and time library based on the C++11/14/17 <chrono> header
Other
3.14k stars 675 forks source link

Download paths with diacritical marks on Windows #783

Open Alberto706 opened 1 year ago

Alberto706 commented 1 year ago

While working with the tz.cpp library on Windows I noticed that the downloading of the time zone database failed because of the wrong conversion of a std::string, containing the download path with the character í, to a std::wstring.

This conversion is performed in the function convert_utf8_to_utf16. It uses the function MultiByteToWideChar with CP_UTF8 as the CodePage parameter. After some tests the problem was solved using the CodePage CP_ACP.

I am not very familiar with this system function, so I am not sure if this solution breaks other character conversions.

HowardHinnant commented 1 year ago

Thank you for this report.

I'm not familiar with (or have) a Windows machine to test on. But some care has been taken to never send a / path delimiter to the Windows OS, for example: https://github.com/HowardHinnant/date/blob/master/src/tz.cpp#L171-L175

Apparently somewhere the code must hardwire / instead of using folder_delimiter. After a brief search I've been unable to locate where. I see a few hardwired '/' but they are #ifdef'd out on Windows. If you have the inclination, if you discover where the / is coming from, I'd love a report on that. Or if you can send me a stack trace leading down to the failing convert_utf8_to_utf16 that would help too.

Thanks.

Alberto706 commented 1 year ago

The character I was mentioning is an i with an acute accent: i + ´ = í (using italics on the character made it similar to an /, sorry). I guess any other character with diacritical marks would have the same problem, this is just the one that gave me problems.

Erroneous1 commented 1 year ago

According to MultiByteToWideChar CP_ACP is the default multibyte encoding your current machine is using with its current settings. Is it possible the download path that is being set is not first being encoded as UTF-8? I'd recommend using wide strings for various Windows functions and convert to a UTF-8 string before interacting with the library using WideCharToMultiByte with a code page of CP_UTF8.