dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.17k stars 4.72k forks source link

API Proposal: Add a ValueStringBuilder #25587

Open JeremyKuhne opened 6 years ago

JeremyKuhne commented 6 years ago

We should consider making a value based StringBuilder to allow low allocation building of strings. While Span allows you to provide a writable buffer, in many scenarios we have a need to get or build strings and we don't know precisely how much space will be needed ahead of time. Having an abstraction that can grow beyond a given initial buffer is particularly useful as it doesn't require looping with Try* APIs- which can be both complicated and have negative performance implications.

We currently use ValueStringBuilder for this purpose internally. It starts with an optional initial buffer (which we often stackalloc) and will grow using ArrayPool if needed.

Design Goals

  1. Allow safe usage of stack memory
  2. Use pooled memory when needed to reduce GC pressure
  3. Allow dynamic and explicit capacity growth
  4. Facilitate interop scenarios (i.e. passing as char* szValue)
  5. Follow API semantics of StringBuilder & string where possible
  6. Be stack allocated

API

Here is the proposed API:

namespace System.Text
{
    public ref struct ValueStringBuilder
    {
        public ValueStringBuilder(Span<char> initialBuffer);

        // The logical length of the builder (end of the "string")
        public int Length { get; set; }

        // Available space in chars
        public int Capacity { get; }

        // Ensure there is at least this amount of space
        public void EnsureCapacity(int capacity);

        // Get a pinnable reference to the builder. "terminate" ensures the builder has a null char after Length in the buffer.
        public ref char GetPinnableReference(bool terminate = false);

        // Indexer, allows setting/getting individual chars
        public ref char this[int index] { get; }

        // Returns a string based off of the current position
        public override string ToString();

        // Returns a span around the contents of the builder. "terminate" ensures the builder has a null char after Length in the buffer.
        public ReadOnlySpan<char> AsSpan(bool terminate);

        // To ensure inlining perf, we have a separate overload for terminate
        public ReadOnlySpan<char> AsSpan();

        public bool TryCopyTo(Span<char> destination, out int charsWritten);

        public void Insert(int index, char value, int count = 1);
        public void Insert(int index, ReadOnlySpan<char> value, int count = 1);

        public void Append(char c, int count = 1);
        public void Append(ReadOnlySpan<char> value);

        // This gives you an appended span that you can write to
        public Span<char> AppendSpan(int length);

        // Returns any ArrayPool buffer that may have been rented
        public void Dispose()
    }
}

This is the current shape of our internal ValueStringBuilder:

namespace System.Text
{
    internal ref struct ValueStringBuilder
    {
        public ValueStringBuilder(Span<char> initialBuffer);
        public int Length { get; set; }
        public int Capacity { get; }
        public void EnsureCapacity(int capacity);

        /// <summary>
        /// Get a pinnable reference to the builder.
        /// </summary>
        /// <param name="terminate">Ensures that the builder has a null char after <see cref="Length"/></param>
        public ref char GetPinnableReference(bool terminate = false);
        public ref char this[int index] { get; }

        // Returns a string based off of the current position
        public override string ToString();

        /// <summary>
        /// Returns a span around the contents of the builder.
        /// </summary>
        /// <param name="terminate">Ensures that the builder has a null char after <see cref="Length"/></param>
        public ReadOnlySpan<char> AsSpan(bool terminate);

        // To ensure inlining perf, we have a separate overload for terminate
        public ReadOnlySpan<char> AsSpan();

        public bool TryCopyTo(Span<char> destination, out int charsWritten);
        public void Insert(int index, char value, int count);
        public void Append(char c);
        public void Append(string s);
        public void Append(char c, int count);
        public unsafe void Append(char* value, int length);
        public void Append(ReadOnlySpan<char> value);

        // This gives you an appended span that you can write to
        public Span<char> AppendSpan(int length);

        // Returns any ArrayPool buffer that may have been rented
        public void Dispose()
    }
}

Sample Code

Here is a common pattern on an API that could theoretically be made public if ValueStringBuilder was public: (Although we would call this one GetFullUserName or something like that.)

https://github.com/dotnet/corefx/blob/050bc33738887d9d8fcc9bc5965b7d9ca65bc7f4/src/System.Runtime.Extensions/src/System/Environment.Win32.cs#L40-L56

The caller is above this method:

https://github.com/dotnet/corefx/blob/050bc33738887d9d8fcc9bc5965b7d9ca65bc7f4/src/System.Runtime.Extensions/src/System/Environment.Win32.cs#L13-L38

Usage of AppendSpan:

https://github.com/dotnet/corefx/blob/3538128fa1fb2b77a81026934d61cd370a0fd7f5/src/System.Runtime.Numerics/src/System/Numerics/BigNumber.cs#L550-L560

I'll add more usage details and possible API surface area.

Notes

kamronbatman commented 3 years ago

Anything slated for language changes to allow this? I think forcing ref structs makes sense as a new language construct.

GazziFX commented 3 years ago

Will it come out in 6.0.0?

kamronbatman commented 3 years ago

For now I exposed my own by essentially shamelessly copying it: https://github.com/modernuo/ModernUO/blob/main/Projects/Server/Buffers/ValueStringBuilder.cs

Also you can use https://github.com/Cysharp/ZString which has a great StringBuilder-like API.

danmoseley commented 2 years ago

This is essentially blocked on resolving discussion in https://github.com/dotnet/runtime/issues/50389.

SupinePandora43 commented 2 years ago

Since roslyn adds support for utf8 strings handled as ReadOnlySpan<byte>, i think it will be better to have something like ValueSpanBuilder<T> instead.

CodingMadness commented 1 year ago

Has been anything further done in this regard, i ask out of curiosity

lsoft commented 1 year ago

ValueStringBuilder is in runtime now, as I can see: https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/System/Text/ValueStringBuilder.cs

but not public for some reason. what is that reason? public ValueStringBuilder is the key to provide, for example, global::System.Net.WebUtility.HtmlDecode(ReadOnlySpan<char>) into the public space.

stephentoub commented 1 year ago

but not public for some reason. what is that reason?

https://github.com/dotnet/runtime/issues/25587#issuecomment-525424732

ericstj commented 6 months ago

Just saw another copy of this type pop up, and see that we currently have 20+ copies in the product. https://source.dot.net/#q=ValueStringBuilder

Is there any progress on the language feature that would make us comfortable exposing this type?

stephentoub commented 6 months ago

and see that we currently have 20+ copies in the product

To clarify (I'm not sure if this is what you meant or not), we have that many in binaries, not that many source copies, with the same source built as internal into many places.