KhronosGroup / KTX-Software

KTX (Khronos Texture) Library and Tools
Other
861 stars 225 forks source link

Utf-8/unicode support in legacy tools and lib. #800

Closed MarkCallow closed 9 months ago

MarkCallow commented 10 months ago

Fixes #764.

MarkCallow commented 10 months ago

@aqnuep I want to remove the _TCHARs and && !defined(_UNICODE) from InitUTF8CLI. I cannot see any use for them beyond allowing someone to compile the tools with _mainw (or whatever it is called) passing the wchar_t** argv to InitUTF8CLI and avoiding the call to GetComandLineW. I don't see the point in the added complexity and I can't see us changing the way we compile the tools. In any case the code as is now does not support the usage I just described.

Thank you so much @aqnuep for the clean method you made to handle unicode file names on Windows. It is so much nicer than having generic text macros everywhere. I have completely removed those macros from everywhere. I'll await your thoughts before pushing a commit or not.

aqnuep commented 10 months ago

@aqnuep I want to remove the _TCHARs and && !defined(_UNICODE) from InitUTF8CLI. I cannot see any use for them beyond allowing someone to compile the tools with _mainw (or whatever it is called) passing the wchar_t** argv to InitUTF8CLI and avoiding the call to GetComandLineW. I don't see the point in the added complexity and I can't see us changing the way we compile the tools. In any case the code as is now does not support the usage I just described.

Thank you so much @aqnuep for the clean method you made to handle unicode file names on Windows. It is so much nicer than having generic text macros everywhere. I have completely removed those macros from everywhere. I'll await your thoughts before pushing a commit or not.

The reason I left those in is to avoid any existing users of any subset of the repository components with _UNICODE. If we explicitly want to disallow using any component of the repository with _UNICODE then that has to be a conscious decision that would also include generating a configure-time error if _UNICODE is defined. But I'd rather not risk breaking any user code.

MarkCallow commented 10 months ago

Thank you for your review, @aqnuep. Since building with _UNICODE leads to build errors, and always has, we will not be breaking any user code by removing it. With all generic text stuff removed defining _UNICODE will not affect the compiled code so I don't think an error is necessary. Perhaps a warning that is has no effect.

aqnuep commented 10 months ago

Thank you for your review, @aqnuep. Since building with _UNICODE leads to build errors, and always has, we will not be breaking any user code by removing it. With all generic text stuff removed defining _UNICODE will not affect the compiled code so I don't think an error is necessary. Perhaps a warning that is has no effect.

I'm just unsure whether that's the case with all combinations of the configuration parameters, but if you're confident about this and do not see any risk with removing support for it, we could, and then we should add an explicit configure-time error when _UNICODE is enabled, just in case.

MarkCallow commented 9 months ago

I'm just unsure whether that's the case with all combinations of the configuration parameters, but if you're confident about this and do not see any risk with removing support for it, we could, and then we should add an explicit configure-time error when _UNICODE is enabled, just in case.

I don't understand exactly what your concern is.

There are 3 ways I see that _UNICODE could be enabled:

  1. Somebody edits one of more of the CMake files to add the define during compilation.
  2. Somebody is building with their own build system and defines _UNICODE
  3. The compiler or OS automatically defines _UNICODE.

I don't think we should be concerned about 1 & 2. For 3, my only concern would be an app being compiled with _UNICODE defined that calls one of the *CreateFromNamedFile or *WriteToNamedFile libktx functions but if the app attempts to pass a wchar_t string to one of them a compile error will result. An additional warning or error won't help.

Currently if _UNICODE is defined compilation of the tools and loadtest apps will fail. Gtest-based tests and library compilation will succeed since they contain no generic text macros. This is independent of configuration parameters.

The one thing I think we should fix here, with or without removal of generic text macro use, is to fix main() and InitUTF8CLI() to support Windows UWP. Yes, UWP does support console apps. In main this means

#if defined WINDOWS_UWP
int __cdecl main() {
    initUTF8CLI(__argc, __argv);
#else
int main(int argc, char** argv) {
    initUTF8CLI(argc, argv);
#endif
    ...
}

and in initUTF8CLI

#if defined(_WIN32) || defined(WINDOWS_UWP)
#if defined(WINDOWS_UWP)
inline void InitUTF8CLI(int& argc, wchar_t* argv[]) {
    LPWSTR* wideArgv = argv;
#else
inline void InitUTF8CLI(int& argc, char* argv[]) {
    // Windows does not support UTF-8 argv so we have to manually acquire it
    static std::vector<std::unique_ptr<char[]>> utf8Argv(argc);
    LPWSTR commandLine = GetCommandLineW();
    LPWSTR* wideArgv = CommandLineToArgvW(commandLine, &argc);
#endif
    for (int i = 0; i < argc; ++i) {
        int byteSize = WideCharToMultiByte(CP_UTF8, 0, wideArgv[i], -1, nullptr, 0, nullptr, nullptr);
        utf8Argv[i] = std::make_unique<_TCHAR[]>(byteSize);
        WideCharToMultiByte(CP_UTF8, 0, wideArgv[i], -1, utf8Argv[i].get(), byteSize, nullptr, nullptr);
        argv[i] = utf8Argv[i].get();
    }
#else
    // Nothing to do for other platforms
    (void)argc;
    (void)argv;
#endif
}

Note that currently compile of platform_utils.h will fail if _UNICODE is defined as ktxtools apps will expect argv[] to be a char but it will be a wchat_t.

aqnuep commented 9 months ago

My only concern is (2) in your list, e.g. they add_subdirectory the KTX-Software repo folder with _UNICODE defined with a subset of the build flags that may currently happen to build with it.

Then again, it's just a general backward compatibility concern, and considering that such users can always just use an older version of the code considering that this was never officially supported and/or functional in practice.

So if you're fine with removing support for _UNICODE=TRUE altogether, that works for me.

aqnuep commented 9 months ago

Regarding Windows UWP support, that should be a separate effort IMO. We should first confirm whether it really is a must to support _UNICODE for UWP, and if yes, then that kind of reopens this can of worms and would have to look into possible solutions.

MarkCallow commented 9 months ago

So if you're fine with removing support for _UNICODE=TRUE altogether, that works for me.

Okay. I'll add the commit that removes all generic text usage.

Regarding Windows UWP support, that should be a separate effort IMO.

I agree. I got carried away. I researched the UWP situation to see if lack of _UNICODE would be an issue. It isn't. It is only relevant to software using the generic text macros to support unicode. Since we aren't, it doesn't apply. The main point is that the command line arguments are only available to UWP apps as wchar_t*. With that knowledge I thought about the changes needed to support UWP in our code and wrote it up. I realized after the fact that there is an error in my sample code.

aqnuep commented 9 months ago

I agree. I got carried away. I researched the UWP situation to see if lack of _UNICODE would be an issue. It isn't. It is only relevant to software using the generic text macros to support unicode. Since we aren't, it doesn't apply. The main point is that the command line arguments are only available to UWP apps as wchar_t*. With that knowledge I thought about the changes needed to support UWP in our code and wrote it up. I realized after the fact that there is an error in my sample code.

I think we should get a UWP build on CI, if we'd like to tackle this, but in the end the main entry point's signature wouldn't even matter if we'd use the same wchar APIs directly to fetch the UTF-16 parameters manually and convert them to UTF-8.