UE4SS-RE / RE-UE4SS

Injectable LUA scripting system, SDK generator, live property editor and other dumping utilities for UE4/5 games
http://docs.ue4ss.com/
MIT License
1.37k stars 187 forks source link

Platform agnostic string types #566

Open Yangff opened 5 months ago

Yangff commented 5 months ago

The current implementation of UE4SS uses File::StringType = std::wstring and wchar_t as the primary character types, assuming compatibility with Unreal's main character type TCHAR, and uses UTF-16 encoding. Additionally, UE4SS uses this type as the working type for the platform file system. The following issues have been observed:

  1. On the Linux platform, the underlying type for std::wstring is wchar_t, whose sizeof is 4, using UTF-32 instead of UTF-16.
  2. Across all platforms, std::u16string has compatibility issues with existing C++ STL, such as not being accepted by std::format. On the Linux platform, neither wstring nor u16string is directly accepted by Linux or the C standard library. On Windows, although wchar_t and char16_t are essentially the same, they cannot be implicitly converted, and using u16string introduces potential issues.
  3. Similarly, when interacting with Rust, char16_t should be used.
  4. UE4SS files should use UTF-8 encoding.

Therefore, the following changes are planned (please make modifications):

  1. Remove the string type definition from File and redefine StringType as std::basic_string<TCHAR> as part of Unreal, no longer belonging to the File namespace.
  2. Use fmtlib that supports u16string to provide std::format.
  3. File will continue to provide file APIs (without breaking ABI) and might be replaced by fmt in the future (?).
  4. Unify the string type used inside UE4SS C++ code to StringType/TCHAR and only convert at the IO/Language edge.
  5. Helper will provide implementations for encoding conversion, which could be replaced by Unreal's implementation in the future.
  6. (?) Force the use of u16string on Windows. This approach has the benefit of triggering errors on Windows if they directly access the underlying type of u16string without appropriate conversion/encoding, thereby exposing potential issues.

Please comment/update here and make sure we agree on what changes go into the strings PR etc..

UE4SS commented 5 months ago
  1. Remove the string type definition from File and redefine StringType as std::basic_string as part of Unreal, no longer belonging to the File namespace.

Just noting that doing this will make Unreal a dependency for every single one of our other small dependencies that currently depend on File for StringType, a bit awful to say the least (the ini parser would need Unreal for example) but if it solves this problem and unblocks the Linux port then maybe it's acceptable and we can figure out a better way later.

  1. File will continue to provide file APIs (without breaking ABI) and might be replaced by fmt in the future (?).

I don't think it's appropriate to replace File with fmt, does it even provide replacements for all the APIs we use ? I think either we continue to use and maintain File, find another actual library for interacting with files, do whatever UE does, or use whatever the standard libraries provide.

These are the three original reasons why File exists:

  1. For me to learn, this also goes for most of our other first-party dependencies.
  2. I hate the C++ standard way to deal with files.
  3. I hate third-party dependencies, but this has gone out the door because the project has obviously expanded well beyond my original plan so my own homegrown solutions may not be suitable or good enough. I still don't like dependencies though and I hate that patternsleuth has an unbelievable amount of them.
narknon commented 5 months ago

Regarding 4, CC and Arch seemed to have different thoughts on how often to use StringType vs FString than you and I did. I'd like them to chime in.

narknon commented 5 months ago

@localcc @archengius

Yangff commented 5 months ago
  1. Remove the string type definition from File and redefine StringType as std::basic_string as part of Unreal, no longer belonging to the File namespace.

Just noting that doing this will make Unreal a dependency for every single one of our other small dependencies that currently depend on File for StringType, a bit awful to say the least (the ini parser would need Unreal for example) but if it solves this problem and unblocks the Linux port then maybe it's acceptable and we can figure out a better way later.

  1. File will continue to provide file APIs (without breaking ABI) and might be replaced by fmt in the future (?).

I don't think it's appropriate to replace File with fmt, does it even provide replacements for all the APIs we use ? I think either we continue to use and maintain File, find another actual library for interacting with files, do whatever UE does, or use whatever the standard libraries provide.

These are the three original reasons why File exists:

  1. For me to learn, this also goes for most of our other first-party dependencies.
  2. I hate the C++ standard way to deal with files.
  3. I hate third-party dependencies, but this has gone out the door because the project has obviously expanded well beyond my original plan so my own homegrown solutions may not be suitable or good enough. I still don't like dependencies though and I hate that patternsleuth has an unbelievable amount of them.
  1. I'm not sure where is the best place to put the StringType.. I saw this move being mentioned multiple times in the chat but personally I'd prefer leaving it outside of Unreal.. (as Unreal itself should almost always use FString in its own codebase. I see no reason to put it inside).. A more obvious apporach maybe leave them inside Helper or a dedicated header dep..

  2. fmt only gets things like read all and write all. That's why I think we will leave other parts of File as it for now, but idk other people's thought on that..

Regarding 4, CC and Arch seemed to have different thoughts on how often to use StringType vs FString than you and I did. I'd like them to chime in.

yeah.. although I'm not sure how to ... but another way is to time slice the program to use u16stirng at the beginning and switch to (somehow) fully FString later.. it might make the typing hard to manage somehow..

narknon commented 5 months ago

I'm not sure where is the best place to put the StringType.. I saw this move being mentioned multiple times in the chat but personally I'd prefer leaving it outside of Unreal

I think ideally our setup would end up being very similar to Unreal's with Unreal's core source being the earliest dependency. So it makes sense to typedef it there so it propagates up.

localcc commented 5 months ago

So my idea was to use basic_string before we get unreal allocator, and FString after, for functions which need to work with both we just make them templated to support both. This presented an issue that FString is not as nice to work with and ofc no external library will support it.

So the end solution is we use both strings, FString at unreal boundary and basic_string everywhere else, we of course need to write conversion functions between the two, but it will mostly be no-ops at runtime so won't cause performance concerns.

Yangff commented 5 months ago

So.. solve the circluar deps seem extra works and not related to this.. will create a string module and use macro to define which char type to use..

Yangff commented 5 months ago

https://github.com/Yangff/RE-UE4SS/tree/port/strings-fmt

This branch should contain attempt to use fmt with StringType defined in its own module.

Currently I'm able to compile it without UVTD when setting CharType = char16_t, this should contain most typing work reuqired for Linux port.

Real windows build should use wstring which should compile without any problem because no type are changed. Setting CharType = chat16_t is purely to ensure I processed all types.

UVTD is excluded from this because it's Windows only.

Yangff commented 5 months ago

Okay, getting the PRs in.

Related PRs are:

FmtLib https://github.com/UE4SS-RE/RE-UE4SS/pull/565 StringType https://github.com/UE4SS-RE/RE-UE4SS/pull/574 String Port in UE4SS https://github.com/UE4SS-RE/RE-UE4SS/pull/575 String Port in UEPseudo https://github.com/Re-UE4SS/UEPseudo/pull/96