dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.66k stars 4.58k forks source link

IBufferWriter<T> is good but not the best #32391

Open artelk opened 4 years ago

artelk commented 4 years ago

IBufferWriter has two methods: GetMemory and GetSpan. The implementer class has to implement them both. The GetMemory assumes our buffers are managed arrays. I believe the GetSpan returns just the same memory as GetMemory so there is no reason to have both methods in the interface because you can get the Span from the Memory. The IBufferWriter cannot expose unmanaged memory chunks, e.g. you cannot use memory mapped files under the hood. I would propose something like this:

public interface ISpanWriter<T>
{
    void Advance(int count);
    Span<T> GetSpan(int sizeHint = 0);
}

public interface IBufferWriter<T>
{
    void Advance(int count);
    Memory<T> GetMemory(int sizeHint = 0);
    Span<T> GetSpan(int sizeHint = 0) => GetMemory(sizeHint).Span; // or even extension method
}

public readonly struct MemorySpanWriter<T> : ISpanWriter<T>
{
    private readonly IBufferWriter<T> bufferWriter;
    public MemorySpanWriter(IBufferWriter<T> bufferWriter) => this.bufferWriter = bufferWriter;
    public void Advance(int count) => bufferWriter.Advance(count);
    public Span<T> GetSpan(int sizeHint = 0) => bufferWriter.GetSpan(sizeHint);
}

public static class BufferWriterExtensions
{
    public static MemorySpanWriter<T> ToMemorySpanWriter<T>(this IBufferWriter<T> bufferWriter) => new MemorySpanWriter<T>(bufferWriter);
}

The classes implementing the IBufferWriter could either implement the ISpanWriter as well (preferred) or rely on the ToMemorySpanWriter().

It would be good if the Utf8JsonWriter to be implemented on the base of the ISpanWriter. In that case it wouldn't be limited to managed memory buffers only.

AlgorithmsAreCool commented 4 years ago

The GetMemory assumes our buffers are managed arrays. I believe the GetSpan returns just the same memory as GetMemory so there is no reason to have both methods in the interface because you can get the Span from the Memory.

I don't think this is true. You could use unmanaged memory with a custom MemoryManager implementation which changes the implementation of Memory<T>.

Edit: Also, see this example PR of a perf benefit to having an explicit GetSpan

Drawaes commented 4 years ago

That is correct @AlgorithmsAreCool I have written something exactly like that internally (a memory mapped file that hands out memory instances over the chunks of it) there is no requirement that memory is backed by an array just that it will survive not being on the stack (eg no stackalloc).

artelk commented 4 years ago

Ok, agree, formally you are right. By my concern is still valid... 😄

So I need to make effort and implement the GetMemory using this MemoryManager trick. And this is just to be able to inject it to the Utf8JsonWriter. This is because the Utf8JsonWriter calls the GetMemory instead of the GetSpan. It uses the GetMemory because it has its own buffering logic (the logic that I would expect to be in the IBufferWriter implementer class itself). The Utf8JsonWriter requests more memory than it needs, then it stores the returned Memory in a field and then calls Span on the Memory field every time it needs to write a few more bytes (sample). If the Memory is full it calls Advance and requests a new Memory.

I believe all these things are needed to not call the GetSpan & Advance on the IBufferWriter each time we need to write a few bytes. And this is done under assumption that the GetSpan+Advance on the IBufferWriter are more expensive than Span on the Memory object & this.BytesPending+=n.

But take a look at how Memory.Span property is implemented. It involves several type-checks, several jumps and for the MemoryManager-based Memory it anyways calls the MemoryManager.GetSpan (which is even virtual for the MemoryManager). And they will be executed every time we write every token.

So I suspect (but cannot proof yet) if Utf8JsonWriter would be simplified and it wouldn't contain the bufferring-like logic the performance could be even improved. And GetSpan+Advance pair implementation would be enough to use the Utf8JsonWriter. In this case it would be reasonable to have a separate interface for this pair (more radical solution: get rid of the GetMemory in the IBufferWriter itself).

Edit: To avoid calling the methods via interface I would propose shape-like solution:

class Utf8JsonWriter<TBufferWriter>
  where TBufferWriter: IBufferWriter<byte>
{
  private TBufferWriter _output;
  // all implementation is here
}

class Utf8JsonWriter: Utf8JsonWriter<ArrayBufferWriter<byte>>
{
  private Stream? _stream;
  // override Flush methods
  public static Utf8JsonWriter<TBufferWriter> FromBufferWriter(TBufferWriter bufferWriter)
    where TBufferWriter: IBufferWriter<byte>
  {
    return new Utf8JsonWriter<TBufferWriter>(bufferWriter);
  }
}

If the TBufferWriter is a struct the calls could be inlined.

AlgorithmsAreCool commented 4 years ago

IBufferWriter is in a tough spot. We want the performance of GetSpan if we can use it, but we need to flexibility of GetMemory if we can't (for example in async methods or iterators).

Better docs would definitely help some. I was confused by the design too.

artelk commented 4 years ago

IBufferWriter is in a tough spot. We want the performance of GetSpan if we can use it, but we need to flexibility of GetMemory if we can't (for example in async methods or iterators).

Good point. Yes, the Memory survives after await and yield return. Btw, the Utf8JsonWriter doesn't reuse the memory after the FlushAsync.

To be honest I couldn't invent any situation when this code:

async Task SampleAsync()
{
    _memory = bw.GetMemory();
    WriteBefore();
    await SomethingAsync();
    WriteAfter();
    bw.Advance(_bytesWrittenTotal);
}

void WriteBefore()
{
    var span = _memory.Slice(_bytesWrittenTotal);
    // <write to the span>
    _bytesWrittenTotal += bytesWritten;
}

void WriteAfter()
{
    var span = _memory.Slice(_bytesWrittenTotal);
    // <write to the span>
    _bytesWrittenTotal += bytesWritten;
}

has any advantage over this code:

async Task SampleAsync()
{
    WriteBefore();
    await SomethingAsync();
    WriteAfter();
}

void WriteBefore()
{
    var span = bw.GetSpan();
    // <write to the span>
    bw.Advance(bytesWritten);
}

void WriteAfter()
{
    var span = bw.GetSpan();
    // <write to the span>
    bw.Advance(bytesWritten);
}

Probably the Memory could be needed only if you want to be able to override the previously written data in the memory instance after the await.

Anyways I think if some method or class only uses GetSpan&Advance then that method or class shouldn't require you to implement the GetMemory. To express this we need an interface segregation. The desired design is:

public interface ISpanWriter<T>
{
    void Advance(int count);
    Span<T> GetSpan(int sizeHint = 0);
}

public interface IBufferWriter<T>: ISpanWriter<T>
{
    Memory<T> GetMemory(int sizeHint = 0);
}

So the ISpanWriter is a span-only version of the buffer writer. And it would be very nice if the Utf8JsonWriter to be simplified to consume the ISpanWriter. I could prepare a pool request for the change if everyone is agree.

@ahsonkhan @stephentoub @pakrym

adamsitnik commented 4 years ago

Hi @artelk

Thank you for your proposal.

I could prepare a pool request for the change if everyone is agree.

Please wait till @JeremyKuhne and @layomia respond (they are the new owners of System.Buffers area).

BTW. I am sure that they are going to ask for the exact use case and scenario that is impossible as of today and would be possible with the proposed API.

artelk commented 4 years ago

Imagine you use memory from a memory mapped file this way: https://github.com/Lokad/CashDB/blob/master/src/CashDB.Lib/MemoryMappedFileSlim.cs. And you want to use the Utf8JsonWriter to write to the file. You can expose the file via MemoryManager and implement the IBufferWriter on top of that. But that is more complex than it could be and that will work visibly slowly than it could.

Actually there are two proposals:

  1. Split the IBufferWriter interface extracting the span-only part. So if some class or method only use the GetSpan that will be visible from their signatures. Without that the class/method client would need to implement the GetMemory even if that is not needed. (ISP)
  2. Change the current Utf8JsonWriter or introduce the second one (sharing common parts) that will consume the ISpanWriter. Ideally that should be a generic type parametrized by the TSpanWriter so the clients could pass the writer wrapped into a struct for better performance (see #32815)