citizenfx / fivem

The source code for the Cfx.re modification frameworks, such as FiveM, RedM and LibertyM, as well as FXServer.
https://cfx.re/
3.52k stars 2.07k forks source link

SplitString/StringToArray doesn't work properly with non-ASCII characters #2143

Open Sasino97 opened 1 year ago

Sasino97 commented 1 year ago

The following method works perfectly when the input string is only made up of 7-bit standard ASCII characters:

                /// <summary>
        /// Splits the <paramref name="inputString"/> into an array with each string having 99 characters or less<br />
        /// This is needed as characters beyond 99 aren't rendered by GTA V, e.g.: with <see cref="Text"/>
        /// </summary>
        /// <remarks>arrays of 5 and higher (396+ characters) are known to <i>not</i> being rendered by GTA V</remarks>
        /// <param name="inputString">The string to convert.</param>
        /// <returns>array containing strings each 99 characters or less.</returns>
        public static String[] SplitString(String inputString)
        {
            int stringsNeeded = (inputString.Length - 1) / 99 + 1; // division with round up

            String[] outputString = new String[stringsNeeded];
            for (int i = 0; i < stringsNeeded; i++)
            {
#if MONO_V2
                outputString[i] = inputString.Substring(i * 99, 99);
#else
                outputString[i] = inputString.Substring(i * 99, MathUtil.Clamp(inputString.Substring(i * 99).Length, 0, 99));
#endif
            }

            return outputString;
        }

However, whenever characters out of the ASCII range are added, the count is messed up, causing the trimming of some trailing characters. I'm not talking about emojis or Chinese characters, even a simple accented letter, which uses 1 more bit, causes this issue to reproduce.

Proposed solution: split every 50 characters instead of 99, or add an optional parameters that defaults to 99 but that we can change it to whatever we want.

blattersturm commented 1 year ago

Right. This technically should split by every 100 bytes, not characters, to match the internal script string buffer stuff.