dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.93k stars 4.64k forks source link

[API Proposal]: Add access to StringBuilder content by using ReadOnlySequence<char> #87362

Open AlexRadch opened 1 year ago

AlexRadch commented 1 year ago

Background and motivation

There are many methods for working with the contents by using the ReadOnlySequence<char> structure:

There are no such methods for the StringBuilder class, and it is often used as a buffer to create string content quickly and conveniently. Therefore, there is a desire to work with the contents of the StringBuilder class as quickly and conveniently as with the contents of the ReadOnlySequence<char> structure.

There are 3 ways to deal with the contents of a StringBuilder as ways as with the ReadOnlySequence<char> structure:

  1. Create a string and use it. This is not optimal.

  2. Use the StringBuilder.GetChunks() method and implement the required methods yourself, repeating the methods already written for the ReadOnlySequence<char> structure. This is probably the fastest in terms of code performance, but very costly to implement and test.

  3. Convert the StringBuilder.ChunkEnumerator structure to a ReadOnlySequence<char> structure and use it further in the already written methods for working with buffers and memory. Implementing such a conversion in third-party libraries will not be the most efficient without access to the internal fields of the StringBuilder class. This conversion can be done more efficiently by accessing the internal fields of the StringBuilder class.

I propose to add to the StringBuilder class (as an option in the extension methods) methods to get its contents as a ReadOnlySequence<char> structure.

API Proposal

namespace System.Text;

public class StringBuilder
{
    public ChunkReverseEnumerator GetReverseChunks(); // will allow us to create performance and memory-efficient code
}

// Assembly: System.Memory.dll
namespace System.Buffers;

public static class BuffersExtensions
{
    public static ReadOnlySequence<char> AsSequence(this StringBuilder builder); // OR GetSequence(); OR ToSequence();
    public static ReadOnlySequence<char> AsSequence(this StringBuilder builder, int startIndex, int length);
}

API Usage

Wherever the ReadOnlySequence<T> class is used:

Alternative Designs

Repeating the methods already written for the ReadOnlySequence<char> structure for the StringBuilder class.

Risks

The contents of the ReadOnlySequence<char> structure can be changed by changing the contents of the original StringBuilder instance.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation There are many methods for working with the contents by using the `ReadOnlySequence` structure: * https://learn.microsoft.com/en-us/dotnet/standard/io/buffers * https://learn.microsoft.com/en-us/dotnet/standard/io/pipelines There are no such methods for the `StringBuilder` class, and it is often used as a buffer to create `string` content quickly and conveniently. Therefore, there is a desire to work with the contents of the `StringBuilder` class as quickly and conveniently as with the contents of the `ReadOnlySequence` structure. There are 3 ways to deal with the contents of a `StringBuilder` as ways as with the `ReadOnlySequence` structure: 1. Create a string and use it. This is not optimal. 2. Use the `StringBuilder.GetChunks()` method and implement the required methods yourself, repeating the methods already written for the `ReadOnlySequence` structure. This is probably the fastest in terms of code performance, but very costly to implement and test. 3. Convert the `StringBuilder.ChunkEnumerator` structure to a `ReadOnlySequence` structure and use it further in the already written methods for working with buffers and memory. Implementing such a conversion in third-party libraries will not be the most efficient without access to the internal fields of the `StringBuilder` class. This conversion can be done more efficiently by accessing the internal fields of the `StringBuilder` class. I propose to add to the `StringBuilder` class (as an option in the extension methods) methods to get its contents as a `ReadOnlySequence` structure. ### API Proposal ```csharp namespace System.Text; public class StringBuilder: IEnumerable { public ReadOnlySequence AsSequence(); // OR GetSequence(); public ReadOnlySequence AsSequence(int startIndex, int length); } // OR public static class StringBuilderExtensions { public static ReadOnlySequence AsSequence(this StringBuilder builder); public static ReadOnlySequence AsSequence(this StringBuilder builder, int startIndex, int length); } ``` ### API Usage Wherever the `ReadOnlySequence` class is used: * https://learn.microsoft.com/en-us/dotnet/standard/io/buffers * https://learn.microsoft.com/en-us/dotnet/standard/io/pipelines * https://learn.microsoft.com/en-us/dotnet/api/system.text.encodingextensions * https://learn.microsoft.com/en-us/dotnet/api/system.buffers.buffersextensions * https://learn.microsoft.com/en-us/dotnet/api/system.buffers.sequencereader-1 * https://learn.microsoft.com/en-us/dotnet/api/system.io.pipelines.pipereader.create * https://learn.microsoft.com/en-us/dotnet/api/system.io.pipelines.readresult.-ctor * https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsondocument.parse * https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.sequencemarshal ### Alternative Designs Repeating the methods already written for the `ReadOnlySequence` structure for the `StringBuilder` class. ### Risks The contents of the `ReadOnlySequence` structure can be changed by changing the contents of the original `StringBuilder` instance.
Author: AlexRadch
Assignees: -
Labels: `api-suggestion`, `area-System.Memory`, `untriaged`
Milestone: -
ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-runtime See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation There are many methods for working with the contents by using the `ReadOnlySequence` structure: * https://learn.microsoft.com/en-us/dotnet/standard/io/buffers * https://learn.microsoft.com/en-us/dotnet/standard/io/pipelines There are no such methods for the `StringBuilder` class, and it is often used as a buffer to create `string` content quickly and conveniently. Therefore, there is a desire to work with the contents of the `StringBuilder` class as quickly and conveniently as with the contents of the `ReadOnlySequence` structure. There are 3 ways to deal with the contents of a `StringBuilder` as ways as with the `ReadOnlySequence` structure: 1. Create a string and use it. This is not optimal. 2. Use the `StringBuilder.GetChunks()` method and implement the required methods yourself, repeating the methods already written for the `ReadOnlySequence` structure. This is probably the fastest in terms of code performance, but very costly to implement and test. 3. Convert the `StringBuilder.ChunkEnumerator` structure to a `ReadOnlySequence` structure and use it further in the already written methods for working with buffers and memory. Implementing such a conversion in third-party libraries will not be the most efficient without access to the internal fields of the `StringBuilder` class. This conversion can be done more efficiently by accessing the internal fields of the `StringBuilder` class. I propose to add to the `StringBuilder` class (as an option in the extension methods) methods to get its contents as a `ReadOnlySequence` structure. ### API Proposal ```csharp namespace System.Text; public class StringBuilder { public ReadOnlySequence AsSequence(); // OR GetSequence(); OR ToSequence(); public ReadOnlySequence AsSequence(int startIndex, int length); } // OR public static class StringBuilderExtensions { public static ReadOnlySequence AsSequence(this StringBuilder builder); public static ReadOnlySequence AsSequence(this StringBuilder builder, int startIndex, int length); } ``` ### API Usage Wherever the `ReadOnlySequence` class is used: * https://learn.microsoft.com/en-us/dotnet/standard/io/buffers * https://learn.microsoft.com/en-us/dotnet/standard/io/pipelines * https://learn.microsoft.com/en-us/dotnet/api/system.text.encodingextensions * https://learn.microsoft.com/en-us/dotnet/api/system.buffers.buffersextensions * https://learn.microsoft.com/en-us/dotnet/api/system.buffers.sequencereader-1 * https://learn.microsoft.com/en-us/dotnet/api/system.io.pipelines.pipereader.create * https://learn.microsoft.com/en-us/dotnet/api/system.io.pipelines.readresult.-ctor * https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsondocument.parse * https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.sequencemarshal ### Alternative Designs Repeating the methods already written for the `ReadOnlySequence` structure for the `StringBuilder` class. ### Risks The contents of the `ReadOnlySequence` structure can be changed by changing the contents of the original `StringBuilder` instance.
Author: AlexRadch
Assignees: -
Labels: `api-suggestion`, `area-System.Runtime`, `untriaged`
Milestone: -
huoyaoyuan commented 1 year ago

If we do this on StringBuilder itself, it would require folding ReadOnlySequence into CoreLib. Is this achievable with public api surface of StringBuilder, using GetChunks?

AlexRadch commented 1 year ago

If we do this on StringBuilder itself, it would require folding ReadOnlySequence into CoreLib.

Is System.Memory.dll not included in CoreLib?

Is this achievable with the public API surface of StringBuilder, using GetChunks?

GetChunks can be used, but such code will be less performance and consume more memory than code that has access to the StringBuilder private fields.

huoyaoyuan commented 1 year ago

Is System.Memory.dll not included in CoreLib?

CoreLib is referenced by everything, and can't reference anything else. It is a single assembly file and doesn't "include" anything. Don't confuse with shared framework.

AlexRadch commented 1 year ago

Is this achievable with the public API surface of StringBuilder, using GetChunks?

If add a new GetChunks method that will return chunks in reverse order, then such code can be created in any place and it will be performance and memory effective.

AlexRadch commented 1 year ago

If we do this on StringBuilder itself, it would require folding ReadOnlySequence into CoreLib. Is this achievable with the public API surface of StringBuilder, using GetChunks?

I rewrote [API Proposal] based on your comments. Thank you!