NightOwl888 / ICU4N

International Components for Unicode for .NET
Apache License 2.0
27 stars 7 forks source link

Task: Update inline StringBuilder calls to use ValueStringBuilder, when supported #56

Closed NightOwl888 closed 1 month ago

NightOwl888 commented 1 year ago

Several methods that are intended to be called by other methods (often in a tight loop) are set up to new up a StringBuilder at the beginning of the method and then return the string at the end. A StringBuilder instance allocates on the heap and so are the underlying array chunks that it manages, so doing this inline when processing strings causes excessive garbage collection. A better alternative is to use ValueStringBuilder with a stack allocated initial buffer (usually 32 chars will suffice). ValueStringBuilder is a ref struct that allocates on the stack. So replacing it means the entire operation occurs on the stack unless it runs out of buffer, in which case it will automatically move the chars over to the heap (from the array pool, so it can be reused across method calls).

Example

        public static string Escape(string s)
        {
            StringBuilder buf = new StringBuilder();
            for (int i = 0; i < s.Length;)
            {
                int c = Character.CodePointAt(s, i);
                i += UTF16.GetCharCount(c);
                if (c >= ' ' && c <= 0x007F)
                {
                    if (c == '\\')
                    {
                        buf.Append("\\\\"); // That is, "\\"
                    }
                    else
                    {
                        buf.Append((char)c);
                    }
                }
                else
                {
                    bool four = c <= 0xFFFF;
                    buf.Append(four ? "\\u" : "\\U");
                    buf.Append(Hex(c, four ? 4 : 8));
                }
            }
            return buf.ToString();
        }

Can be changed to:

        public static string Escape(string s)
        {
#if FEATURE_SPAN
            ValueStringBuilder buf = new ValueStringBuilder(stackalloc char[CharStackBufferSize]);
#else
            StringBuilder buf = new StringBuilder(s.Length);
#endif
            for (int i = 0; i < s.Length;)
            {
                int c = Character.CodePointAt(s, i);
                i += UTF16.GetCharCount(c);
                if (c >= ' ' && c <= 0x007F)
                {
                    if (c == '\\')
                    {
                        buf.Append("\\\\"); // That is, "\\"
                    }
                    else
                    {
                        buf.Append((char)c);
                    }
                }
                else
                {
                    bool four = c <= 0xFFFF;
                    buf.Append(four ? "\\u" : "\\U");
                    buf.Append(Hex(c, four ? 4 : 8));
                }
            }
            return buf.ToString();
        }

Note that ValueStringBuilder is disposable, but calling ToString() disposes it automatically. However, it is better to use .AsSpan() to keep the buffer on the stack as long as possible if the method does not return it as a string immediately. If ToString() is not called, it should be disposed by either putting it in a using block or wrapping it in a try/finally block to explicitly call Dispose().

The above code still heap allocates a string at the end (which is less than ideal), but at least we save the allocation of the StringBuilder and its underlying memory, which if done project-wide will be very impactful for performance and won't impact the API at all.

Note that we still need to conditionally compile with FEATURE_SPAN and use StringBuilder because net40 doesn't support spans. However, in this case we can pass s.Length to the constructor to pre-allocate the correct amount of heap.

This task depends on #54 and may also require #55.