akkadotnet / akka.net

Canonical actor model implementation for .NET with local + distributed actors in C# and F#.
http://getakka.net
Other
4.7k stars 1.04k forks source link

Introduce System.Memory APIs into all Serializer base types #3740

Open Aaronontheweb opened 5 years ago

Aaronontheweb commented 5 years ago

Akka.NET Version: 1.4.0 and beyond.

In an effort to radically speed up Akka.Persistence, Akka.IO, and Akka.Remote, we should take advantage of the new APIs made available in System.Memory, namely Span<T> and Memory<T> to help reduce the duplicative copying of buffers used in all of our I/O operations.

Today, here's what the situation looks like for Akka.Remote on just the write-side, for instance:

  1. Message is sent to a RemoteActorRef via the IActorRef.Tell method
  2. Message is queued up inside the Akka.Remote.EndpointWriter
  3. Message is serialized using whatever its configured serializer is: Protobuf, JSON.NET, etc - this allocates the first set of byte[]
  4. Message is then copied into its container format, i.e. wrapped inside the control messages Akka.Remote uses for routing, which are based on Google.Protobuf - allocates another set of byte[]s for copying again.
  5. The now fully-serialized message is now copied into the Akka.Remote transport, DotNetty in this case, which uses its own buffer pools and copies the buffer for the third time at least.

We have a similar process that works in reverse for deserialization and, again, copies the buffers 2-3 times. The fundamental issue with the original serialization architecture is that each library has its own idea as to how most efficiently manage memory and none of that can be easily exposed or shared to other parts of the I/O pipeline.

The introduction of the System.Memory APIs in .NET Core 2.1 changes all of this - they offer a model where a shared pool of memory can be used without any duplicative copying / buffering between the different stages of the pipeline. Akka.NET should take advantage of this in order to reduce garbage collection pressure on the system and thus, increase our total throughput in the areas of the system that use serialization heavily.

Before someone "weekend project!!!!"-s this issue, the sad news: the rest of the .NET ecosystem isn't quite ready to support this yet.

The three serialization libraries we depend on today:

  1. Google.Protobuf: https://github.com/protocolbuffers/protobuf/pull/5835 - just merged in System.Memory support 15 days ago and are planning on including it in a future release.
  2. Newstonsoft.Json: https://github.com/JamesNK/Newtonsoft.Json/issues/1761#issuecomment-408372008 - waiting on .NET Standard 2.1 / .NET Standard 3.0 to come out, which will make the System.Memory base APIs available.
  3. Hyperion: need to implement the wire format standard first, which is another hairy project. But it'd probably also make sense to wait until .NET Standard 2.1.

And lastly, DotNetty: https://github.com/Azure/DotNetty/issues/411#issuecomment-410289089 - looks like they're waiting for .NET Standard 2.1 / "more adoption" too.

I'd like to keep this thread open now to track any new developments on these issues so when the time comes for System.Memory to take on the world, we can get our work started.

Horusiath commented 5 years ago

There are few notes:

Aaronontheweb commented 5 years ago

This way we can grow potential size of the payload while serializing, without paying the cost of copying the memory to bigger buffer.

Ah ok, that's similar to how DotNetty's framing + encoding system works. The frame headers get appended to the outbound stream as a separate set of 4 bytes, rather than modifying the payload they describe. That model seems correct here.

was thinking about abstracting them into something like: void Serialize(object value, TWriter writer, ISerializerSession session) where TWriter: IBufferWriter, then just introduce writers (they could even be structs!) over actual data types which we want to serialize to.

I'd like to see how things play out with the third party dependencies in the ecosystem. It'd be unfortunate if it's necessary for us to come up with our own abstraction, but it wouldn't be the first time we've had to go down that road. Maybe it won't be though - who knows!

ReubenBond commented 5 years ago

I imagine we will see serializers form up around the general style of this API:

public interface IFieldCodec<T>
{
    void WriteField<TBufferWriter>(ref Writer<TBufferWriter> writer, uint fieldIdDelta, Type expectedType, T value) where TBufferWriter : IBufferWriter<byte>;
    T ReadValue(ref Reader reader, Field field);
}

I use custom Writer & Reader types which hold the serializer session, but the idea is the same.

In most real-world cases, TBufferWriter will not be a struct (eg, PipeWriter is an abstract class), but it's a possibility. For message serialization I currently have this:

internal interface IMessageSerializer
{
    void Write<TBufferWriter>(ref TBufferWriter writer, Message message) where TBufferWriter : IBufferWriter<byte>;

    /// <returns>
    /// The minimum number of bytes in <paramref name="input"/> before trying again, or 0 if a message was successfully read.
    /// </returns>
    int TryRead(ref ReadOnlySequence<byte> input, out Message message);
}

Ideally we can come up with a standard interface, but it's probably not terrible if we all land on separate interfaces with the same shape so that adaptors can be made. The most critical aspect to this is the core write/read types, TBufferWriter : IBufferWriter<byte> & ReadOnlySequence<byte>. By landing on them we can reduce impedance mismatch.

Aaronontheweb commented 5 years ago

The most critical aspect to this is the core write/read types, TBufferWriter : IBufferWriter & ReadOnlySequence. By landing on them we can reduce impedance mismatch.

Agree - if JSON.NET, Google.Protobuf, etc all end up using totally different concepts to express that idea then we'll be back at square 1.

Aaronontheweb commented 2 years ago

Related: https://github.com/akkadotnet/akka.net/pull/6026

to11mtm commented 2 years ago

Some general-ish notes:

Tl;dr- even if we don't take it as an upstream it is a good example of patterns that will likely be useful in providing 'useful defaults' for serialization implementations.

What I do know: