Open Aaronontheweb opened 5 years ago
There are few notes:
Memory<byte>
itself is a general concept around allocated byte buffers, actual structure, that we probably should be interested in is ReadOnlySegment<byte>
: conceptually, it's a linked list of Memory<byte>
. This way we can grow potential size of the payload while serializing, without paying the cost of copying the memory to bigger buffer. It's pretty much the core concept behind System.IO.Pipelines.
Akka.IO.ByteString
. I have an idea how to potentially make it cast-able to ReadOnlySegment<byte>
with low cost (direct inheritance is not possible).void Serialize<TWriter>(object value, TWriter writer, ISerializerSession session) where TWriter: IBufferWriter
, then just introduce writers (they could even be structs!) over actual data types which we want to serialize to.This way we can grow potential size of the payload while serializing, without paying the cost of copying the memory to bigger buffer.
Ah ok, that's similar to how DotNetty's framing + encoding system works. The frame headers get appended to the outbound stream as a separate set of 4 bytes, rather than modifying the payload they describe. That model seems correct here.
was thinking about abstracting them into something like: void Serialize
(object value, TWriter writer, ISerializerSession session) where TWriter: IBufferWriter, then just introduce writers (they could even be structs!) over actual data types which we want to serialize to.
I'd like to see how things play out with the third party dependencies in the ecosystem. It'd be unfortunate if it's necessary for us to come up with our own abstraction, but it wouldn't be the first time we've had to go down that road. Maybe it won't be though - who knows!
I imagine we will see serializers form up around the general style of this API:
public interface IFieldCodec<T>
{
void WriteField<TBufferWriter>(ref Writer<TBufferWriter> writer, uint fieldIdDelta, Type expectedType, T value) where TBufferWriter : IBufferWriter<byte>;
T ReadValue(ref Reader reader, Field field);
}
I use custom Writer
In most real-world cases, TBufferWriter will not be a struct (eg, PipeWriter is an abstract class), but it's a possibility. For message serialization I currently have this:
internal interface IMessageSerializer
{
void Write<TBufferWriter>(ref TBufferWriter writer, Message message) where TBufferWriter : IBufferWriter<byte>;
/// <returns>
/// The minimum number of bytes in <paramref name="input"/> before trying again, or 0 if a message was successfully read.
/// </returns>
int TryRead(ref ReadOnlySequence<byte> input, out Message message);
}
Ideally we can come up with a standard interface, but it's probably not terrible if we all land on separate interfaces with the same shape so that adaptors can be made. The most critical aspect to this is the core write/read types, TBufferWriter : IBufferWriter<byte>
& ReadOnlySequence<byte>
. By landing on them we can reduce impedance mismatch.
The most critical aspect to this is the core write/read types, TBufferWriter : IBufferWriter
& ReadOnlySequence . By landing on them we can reduce impedance mismatch.
Agree - if JSON.NET, Google.Protobuf, etc all end up using totally different concepts to express that idea then we'll be back at square 1.
Some general-ish notes:
IMemoryOwner<byte>
rather than ReadOnlyMemory<byte>
from ToBinary()
would let consumers of the API control final disposal and/or reuse of the segment to maximize usage, while long term possibly allowing us to transition to a 'pooled' byteallocator for the lazy implementation.ToBinary
that took an IBufferWriter<byte>
would also be useful for low-alloc scenarios.ArrayPoolBufferWriter
could be used to provide Writes where ArrayPools are used for buffers automagically.MemoryOwner
stuff is nice as it gives you IMemoryOwner<T>
instances backed by a shared array pool.
MemoryOwner
could wrap a non-pooled array for naieve usage. I suppose a simple wrapper for that case is easy enoughTl;dr- even if we don't take it as an upstream it is a good example of patterns that will likely be useful in providing 'useful defaults' for serialization implementations.
What I do know:
Akka.NET Version: 1.4.0 and beyond.
In an effort to radically speed up Akka.Persistence, Akka.IO, and Akka.Remote, we should take advantage of the new APIs made available in
System.Memory
, namelySpan<T>
andMemory<T>
to help reduce the duplicative copying of buffers used in all of our I/O operations.Today, here's what the situation looks like for Akka.Remote on just the write-side, for instance:
RemoteActorRef
via theIActorRef.Tell
methodAkka.Remote.EndpointWriter
byte[]
byte[]
s for copying again.We have a similar process that works in reverse for deserialization and, again, copies the buffers 2-3 times. The fundamental issue with the original serialization architecture is that each library has its own idea as to how most efficiently manage memory and none of that can be easily exposed or shared to other parts of the I/O pipeline.
The introduction of the
System.Memory
APIs in .NET Core 2.1 changes all of this - they offer a model where a shared pool of memory can be used without any duplicative copying / buffering between the different stages of the pipeline. Akka.NET should take advantage of this in order to reduce garbage collection pressure on the system and thus, increase our total throughput in the areas of the system that use serialization heavily.Before someone "weekend project!!!!"-s this issue, the sad news: the rest of the .NET ecosystem isn't quite ready to support this yet.
The three serialization libraries we depend on today:
System.Memory
support 15 days ago and are planning on including it in a future release.System.Memory
base APIs available.And lastly, DotNetty: https://github.com/Azure/DotNetty/issues/411#issuecomment-410289089 - looks like they're waiting for .NET Standard 2.1 / "more adoption" too.
I'd like to keep this thread open now to track any new developments on these issues so when the time comes for
System.Memory
to take on the world, we can get our work started.