Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API, build options/dependencies, internal fallback logic based on runtime auto-detection of passed string, and the result of file operations (scheduled for removal in 7.78.0). While some parts of libcurl kept using 8-bit strings internally, e.g. when reading the environment.
From the user's perspective this poses an unreasonably complex task in finding out how to pass (or read) a certain non-ASCII string to (from) a specific API without unwanted or accidental conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, reading/writing a different file, accessing the wrong URL or passing a corrupt username or password.
Note that these issues may only affect strings with non-7-bit-ASCII content.
For now the least bad solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented.
Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues.
Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API, build options/dependencies, internal fallback logic based on runtime auto-detection of passed string, and the result of file operations (scheduled for removal in 7.78.0). While some parts of libcurl kept using 8-bit strings internally, e.g. when reading the environment.
From the user's perspective this poses an unreasonably complex task in finding out how to pass (or read) a certain non-ASCII string to (from) a specific API without unwanted or accidental conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, reading/writing a different file, accessing the wrong URL or passing a corrupt username or password.
Note that these issues may only affect strings with non-7-bit-ASCII content.
For now the least bad solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented.
Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues.
Ref: https://github.com/curl/curl/issues/6089 Ref: https://github.com/curl/curl/pull/7246 Ref: https://github.com/curl/curl/pull/7251 Ref: https://github.com/curl/curl/pull/7252 Ref: https://github.com/curl/curl/pull/7257 Ref: https://github.com/curl/curl/pull/7281 Ref: https://github.com/curl/curl/pull/7421 Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings Ref: https://github.com/curl/curl-for-win/commit/8023ee5d05c547443df7c75ea252e8c053946abc