dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

MemoryStream / TryGetBuffer and writing analyzers/fix providers #38093

Open nschuessler opened 4 years ago

nschuessler commented 4 years ago

Background and Motivation

Mentioning @stephentoub as he seems to be on many of the MemoryStream topics.

I see that MemoryStream has a lot of discussion traffic (e.g. here), so add mine to the pile.

We often encounter egregious use of memory and track it down to use of MemoryStream.ToArray. It often piles things up unnecessarily on our LOH. In fact, double because of the copying.

We have written a diagnostic to flag all occurrences of it and try to review the usage to reduce the double allocations. Recently I learned about the TryGetBuffer api, but using it as a replacement is troublesome.

Our rule has two main criticisms/concerns: 1) It flags instances that cannot be improved. 2) If people try to improve it, they often don't understand the complexities of getting the underlying buffer in the MemoryStream, fail, and will be less likely to make performance improvements from our analyzers in the future.

My API proposal is we need (at the minimum):

ReadOnlySpan<byte> GetBufferSpan();

which always succeeds. If the MemoryStream.TryGetBuffer would return false, then it should do memoryStream.ToArray() and return it as an ArraySegment.

This is then a drop in replacement for memoryStream.ToArray()

The complexity of MemoryStream / API makes it hard to write analyzers and fix providers for it.

The logic to determine if MemoryStream.ToArray can use TryGetBuffer requires tracing the MemoryStream instance back through function calls to the source to find if it was created with one of the constructors that will work. This is quite complicated.

TryGetBuffer is not a drop in replacement for ToArray because it can fail, and when it does, it always will. Autogenerating error condition paths and changing variable scopes is complicated. See this "simple" example:

using System.IO;
using System.Text;
class Foo
{
    public string Bar()
    {
        using (MemoryStream memoryStream = new MemoryStream())
        {
            // Write to stream ...
            string jsonString = Encoding.UTF8.GetString(memoryStream.ToArray());
            memoryStream.Close();
            return jsonString;
        }
    }
}

This requires promoting the jsonString declaration above the memoryStream.ToArray and generating an else condition on error.

The other possibility is generating our own extension method:

ReadOnlySpan<byte>   GetBufferSpan(this MemoryStream)

That has this logic in it and force insert a new code file into every solution / project where it doesn't seem to be defined already.

Proposed API

public class MemoryStream
{
       public ArraySegment<byte> GetBufferSegment();
       public ReadOnlySpan<byte> GetBufferSpan();
}

Usage Examples

It would nice to be able to the following:

using (var memoryStream = new MemoryStream())
{
     // Write to memory Stream.

    string jsonString = System.Text.Encoding.UTF8.GetString(memoryStream.GetBufferSpan());
}

Side note: I see System.Text.Encoding takes a ReadOnlySpan as an argument to GetString and not ArraySegment. There is also a new class like Memory<T> as well.

So we need to do conversions now if we have an ArraySegment ?

Alternative Designs

It's unclear to me the need for the 'publicly' visible complication of the MemoryStream class. i.e. If you supply a buffer and publiclyVisible is false, how do you get the used part of the buffer you passed in?

But to make this simple perhaps separate out the publiclyVisible: false functionality into a subclass or different stream implementation so analyzers can easily find MemoryStreams that can be "fixed" and ones that cannot.

public class FixedBufferMemoryStream : Stream
{
     public FixedBufferMemoryStream(byte[] buffer);
     public FixedBufferMemoryStream(ArraySegment<byte> buffer);

}

Then analyzers can quickly tell if the the stream.ToArray() can be fixed or not:


InvocationExpressionSyntax invocation = ...; // Points to memoryStream.ToArray()
INamedTypeSymbol streamType = semanticModel.Compilation.GetTypeByMetadataName("System.IO.MemoryStream");   
TypeInfo invokedType = semanticModel.GetTypeInfo(invocation.Expression);

if (invokedType.Inherits(streamType))
{
    // I can fix it
}
else
{
    // I cannot fix it
}

Risks

Splitting class functionality will of course break some portion of the API audience.

GSPP commented 4 years ago

Naming suggestion:

GetBufferSegment => GetBufferAsSegment GetBufferSpan => GetBufferAsSpan

There is no such thing as a "BufferSegment". As indicated that these methods are differentiated by the format of the data.