embeddedmz / ftpclient-cpp

C++ client for making FTP requests
MIT License
205 stars 65 forks source link

Names of Downloaded files via DownloadWildcard having special characters (french accents) are not handled correctly under Windows. #3

Closed embeddedmz closed 3 years ago

embeddedmz commented 5 years ago

Under Windows, where file names are not coded in UTF-8 but in UTF-16 and code pages (a mess !), the name of the files downloaded with DownloadWildcard (and maybe other methods) and having special characters (e.g. french accents or using not latin characters e.g. arabic or chinese) will not be written correctly.

I have no idea how to fix this now.

embeddedmz commented 4 years ago

We should assume that the FTP server is supporting UTF-8. On Windows, I checked that FileZilla Server is supporting UTF-8 by default.

Under Windows, user must feed the API with paths/file names encoded in UTF-8 (as the API string parameters are using std::string) and NOT in Windows-1252 (ANSI).

Below, a code snippet to convert from ANSI to UTF-16 and then from UTF-16 to UTF8 :

// Transcode Windows ANSI to UTF-16
std::string codepage_str;
int size = MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, codepage_str.c_str(),
                               codepage_str.length(), nullptr, 0);
std::wstring utf16_str(size, '\0');
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, codepage_str.c_str(),
                    codepage_str.length(), &utf16_str[0], size);

// Transcode UTF-16 to UTF-8
int utf8_size = WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
                                    utf16_str.length(), nullptr, 0,
                                    nullptr, nullptr);
std::string utf8_str(utf8_size, '\0');
WideCharToMultiByte(CP_UTF8, 0, utf16_str.c_str(),
                    utf16_str.length(), &utf8_str[0], utf8_size,
                    nullptr, nullptr);

To fix this bug, we will need this kind of code :

#ifdef WINDOWS
std::wstring ToUtf16(std::string str)
{
    std::wstring ret;
    int len = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
    if (len > 0)
    {
        ret.resize(len);
        MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len);
    }
    return ret;
}
#endif

When creating a file in which the downloaded content will be written :

    std::string utf8path = ...;
    std::ifstream iFileStream(
        #ifdef WINDOWS
        ToUtf16(utf8path).c_str()
        #else
        utf8path.c_str()
        #endif
        , std::ifstream::in | std::ifstream::binary);