Open artelk opened 4 years ago
Thanks for the proposal! Are there other benefits to this beyond improving the performance of the Memory<T>.Span
property getter? It's likely that this proposal will significantly increase the number of receivers which require generic specialization. This would negatively impact size-on-disk and would increase the JIT's workload.
We do care about the performance of the Memory<T>.Span
property getter. We continue to make perf improvements to this method. But the recommended way of using Memory<T>
is to call the Span
property getter as infrequently as possible. Generally, call it before entering a loop or other hot path, then within that hot code operate only on the Span<T>
that was just pulled out. This means that it's unlikely that we would complicate the API shape of Memory<T>
just to shave one or two nanoseconds off the Span
property getter.
Your concerns about Utf8String
complicating the property getter are understandable. But our benchmarks show that there's not a noticeable perf impact from enabling this support. (Reminder: our benchmarks largely come from the master branch, where this ifdef has been enabled for ages.) The optimizations we've been able to perform since the initial 2.1 release of Memory<T>
dominate any overhead introduced by adding support for that type.
Thank you for your reply!
Thanks for the proposal! Are there other benefits to this beyond improving the performance of the
Memory<T>.Span
property getter? It's likely that this proposal will significantly increase the number of receivers which require generic specialization. This would negatively impact size-on-disk and would increase the JIT's workload.
Yes, that is mainly about improving the performance.
I believe there will be only two RW memory types provided by the BCL itself: the existing Memory<T>
and the new ArrayMemory<T>
.
And there will be only two RO memory types: the existing ReadOnlyMemory<T>
and the new ReadOnlyMemory<T, TMemory>
.
Will the impact be too big?
Developers who are going to work with some unmanaged memory will have two options: either implement the MemoryManager
and expose the unmanaged memory chunks as Memory<T>
or create a new Memory type. I believe the last option is simpler in implementation (this is the 2nd benefit) and that would work faster at the same time.
The existing Memory<T>
could be simply made compatible with the methods taking the TMemory where TMemory : struct, IMemory<T, TMemory>
just by adding : IMemory<T, Memory<T>>
to the Memory<T>
struct declaration.
I would also add operators for implicit casts ArrayMemory<T>
-->Memory<T>
, ArrayMemory<T>
-->ReadOnlyMemory<T>
and ReadOnlyMemory<T, ArrayMemory<T>>
-->ReadOnlyMemory<T>
. So they could be passed to (not generalized yet) methods taking Memory<T>
and ReadOnlyMemory<T>
without any additional coding.
We do care about the performance of the
Memory<T>.Span
property getter. We continue to make perf improvements to this method. But the recommended way of usingMemory<T>
is to call theSpan
property getter as infrequently as possible. Generally, call it before entering a loop or other hot path, then within that hot code operate only on theSpan<T>
that was just pulled out. This means that it's unlikely that we would complicate the API shape ofMemory<T>
just to shave one or two nanoseconds off theSpan
property getter.
I believe with the proposed approach this recommendation won't be needed anymore.
Sometimes it is hard to avoid calling the Span property too often and if the operations between the subsequent calls also take a few nanoseconds the negative impact on the performance could be significant.
Example: imagine your method takes a Stream
or Socket
and produces an IAsyncEnumerator<SomeSmallStruct>
; you use the new System.IO.Pipelines for performance reason; the SomeSmallStruct
is quite small so in most cases the First
segment of the ReadOnlySequence<T>
returns a ReadOnlyMemory<T>
that is bigger than the struct size so the overhead produced by the ReadOnlySequence<T>
itself is minimal; you yield return
every SomeSmallStruct
produced so you call the Span
property too often (the operations between the subsequent calls can also take a few nanoseconds).
(offtopic: btw, that would be really cool to have such method in BCL: the generic method producing the IAsyncEnumerator<T>
from the Stream
)
Another example: with current Utf8JsonWriter
implementation the Span
property is called on every WriteStartObject/WriteStartArray/WriteEndObject/WriteEndArray/WriteNullValue/WriteBooleanValue/...
Motivation
Currently the Memory.Span property is quite complex and it is slower than in could be (see benchmarks below). It will probably be even slower with Utf8String support: https://github.com/dotnet/runtime/blob/fff8f5ab763f3c1eea95ab201bfde9bd4b6e62fb/src/libraries/System.Private.CoreLib/src/System/Memory.cs#L313 And for the unmanaged memory exposed via MemoryManager it is already significantly slower than it is for the array-based memory.
Proposal
The proposal is to introduce IMemory and IReadOnlyMemory interfaces so you can have multiple memory types. Each type could be implemented and optimized for its own underlying memory provider separately. The interfaces could look like:
The second generic parameter TMemory is needed for the Slice methods and it is assumed to be the same as the memory type implementing the interface. These interfaces are supposed to be implemented only by structs (not classes) and passed to generic methods parametrized by the memory type with constraints (in this case the JIT compiler compiles them for each memory struct type separately avoiding boxing and it also can inline the memory type methods invoked):
Here the customMemory is passed to the WriteAsync as is (not being converted to some readonly type) while the WriteAsync method can only access the Span property returning the ReadOnlySpan. The existing System.Memory could implement IMemory<T, Memory> and the System.ReadOnlyMemory could implement IReadOnlyMemory<T, ReadOnlyMemory> for unification purposes so you will also be able to pass them to the methods like WriteAsync.
A separate ReadOnlyMemory struct could also be useful e.g. to be returned from some methods. That can be a wrapper for the RW-memory:
In this case the layout of the ReadOnlyMemory<T, SomeMemory> would be the same as the layout of the SomeMemory. Of course you will also be able to implement the IReadOnlyMemory<T, TMemory> directly (e.g. if you don't plan to have a RW-memory for your memory provider).
There is a problem with implementing extension methods for such memory types. Example:
Unfortunately the compiler won't be able to infer the T parameter from the TMemory parameter and it will ask you to explicitly specify them both. Possible solution to help the compiler is to intruduce a helper struct with two generic parameters:
And then add a property to the IReadOnlyMemory returning that:
So the extension methods could be implemented on the Wrapper type level:
For existing memory types shortcuts could be also implemented to not require accessing the It property directly. They should reuse the implementation of the methods taking the Wrapper:
The source code and the benchmark results
IReadOnlyMemory and IMemory interfaces
```cs public interface IReadOnlyMemoryReadOnlyMemory struct
```cs public readonly struct ReadOnlyMemoryArrayMemory that is supposed to be a part of .Net API
```cs public readonly struct ArrayMemoryCustomMemoryProvider class that can expose some unmanaged memory as System.Memory via MemoryManager as well as its own CustomMemory
```cs public sealed partial class CustomMemoryProviderThe CustomMemory struct
```cs public sealed partial class CustomMemoryProviderBenchmark tests code
```cs class Program { static void Main(string[] args) { BenchmarkRunner.Run(typeof(Program).Assembly); } } public class Tests { private CustomMemoryProviderThe benchmark results
BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362 Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.1.101 [Host] : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT DefaultJob : .NET Core 3.1.1 (CoreCLR 4.700.19.60701, CoreFX 4.700.19.60801), X64 RyuJIT
Open questions
Open the open questions
There are multiple Marshal methods that require Memory and ReadOnlyMemory to have the same fields and the same layout. Probably it won't be possible to reproduce them all for all the types implementing the IMemory and IReadOnlyMemory. Only the ReadOnlyMemory