FlatBuffers serialization timing

neuecc commented 1 year ago

Thank you for the adding my serializers and I read yout article(2022, and 2019,2018 from before). By the way, the deserialization of FlattBuffers is at the timing when each property is accessed, and the serialization is also at the timing when the object is constructed.

In other words, it serialize here. https://github.com/Alois-xx/SerializerTests/blob/master/Program.cs#L718

For example, BookFlat.EndBookFlat calls builder.EndTable(); https://github.com/Alois-xx/SerializerTests/blob/master/TypesToSerialize/BookFlat.cs#L31 https://github.com/google/flatbuffers/blob/master/net/FlatBuffers/FlatBufferBuilder.cs#L804 This is clearly the serialization process itself.

Therefore, shouldn't the benchmark serialization assemble BookShelfFlat here instead of ToFullArray? https://github.com/Alois-xx/SerializerTests/blob/master/Serializers/FlatBuffer.cs#L23

neuecc commented 1 year ago

Anyway, thanks for the useful benchmarks. Your benchmark is one of the reasons why MemoryPack was developed (MessagePack loses from others!).

Alois-xx commented 1 year ago

Yes I know that FlatBuffers are different. It all depends on your access pattern since it is a giant struct which needs to be somehow created from "real" objects which do not play well with normal .NET objects. This puts it into the middle between: I already have objects and I need to create a FlatBuffer object from these objects which is then "serialized", or I have the FlatBuffer, but the strings are created new on every access. Since it originates from gaming where you access large objects like textures it definitely makes sense there but it does not fit well into the usual .NET object world.

Initially I have included everything, but that did also include the temporary string allocation to create the FlatBuffer object which is also wrong to measure since it involves during "serialize" the creation of two objects. One time the string and another time to store the string in the FlatBuffer object. That would be also unfair.

In a similar manner I struggle with deserialize. Should I touch all strings and integers at least once to say I have deserialized the object, or should I measure only the time until I have got the FlatBuffer object into existence?

I can measure FlatBuffers as superfast if everything is byte buffers, or super slow if everything is UTF-16 strings. To not get into the details where everyone can say I did measure it wrong I measure only from existing "objects". People trying FlatBuffers will quickly realize what it takes and they will switch over to something simpler like MemoryPack.

I do not have a good answer what is the correct way to measure FlatBuffers since it works completely different.

Thanks that you find the benchmarks useful. I love finding new insights with data no one has seen yet. E.g. just by adding your attributes to BookShelf you pay always the costs for some statics which might not always be serialized with MemoryPack. Could this be made less expensive?

neuecc commented 1 year ago

Yes, FlatBuffer is a powerful structure when you want to serialize multiple times and when you want to deserialize only some properties or relay data without touching anything. However, in the case of deserialization, it is necessary to use a byte[] instead of a buffer because the Buffer of the Socket is pooled in the case of network communication, etc, so copying to byte[] is always necessary. If generating T from a Pooled buffer does not change the processing time, it is better to use byte[] wrapper. The memory loading time is the same, you know. This is one of the sources of the MemoryPack idea.

In benchmarking, I think it would be good to measure the time for one serialization and one deserialization. How about FlatBuffers?

[SerializerType("https://google.github.io/flatbuffers/", SerializerTypes.Binary | SerializerTypes.SupportsVersioning)]
class FlatBuffer<T> : TestBase<(byte[] bytes, int nToCreate, int bookDataSize), ByteBuffer>
{
    public FlatBuffer(Func<int, (byte[] bytes, int nToCreate, int bookDataSize)> testData, Action<(byte[] bytes, int nToCreate, int bookDataSize), int, int> touchAndVerify)
        : base(testData, touchAndVerify)
    {
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    protected override void Serialize((byte[] bytes, int nToCreate, int bookDataSize) obj, Stream stream)
    {
        var (bytes, nToCreate, bookDataSize) = obj;

        var builder = new FlatBufferBuilder(1024);

        Offset<BookFlat>[] books = new Offset<BookFlat>[nToCreate];

        for (int i = 1; i <= nToCreate; i++)
        {
            var title = builder.CreateString($"Book {i}");
            builder.StartVector(1, bookDataSize, 0);
            if (bytes.Length > 0)
            {
                builder.Put(bytes);
            }
            VectorOffset bookbyteArrayOffset = builder.EndVector();
            var bookOffset = BookFlat.CreateBookFlat(builder, title, i, bookbyteArrayOffset);
            books[i - 1] = bookOffset;
        }

        var secretOffset = builder.CreateString("private member value");
        VectorOffset booksVector = builder.CreateVectorOfTables<BookFlat>(books);
        var lret = BookShelfFlat.CreateBookShelfFlat(builder, booksVector, secretOffset);
        builder.Finish(lret.Value);
        var bookshelf = BookShelfFlat.GetRootAsBookShelfFlat(builder.DataBuffer);

#if NET48_OR_GREATER
        var fullBuffer = bookshelf.ByteBuffer.ToFullArray();
        stream.Write(fullBuffer, bookshelf.ByteBuffer.Position, bookshelf.ByteBuffer.Length - bookshelf.ByteBuffer.Position);
#else
        var span = bookshelf.ByteBuffer.ToSpan(bookshelf.ByteBuffer.Position, bookshelf.ByteBuffer.Length - bookshelf.ByteBuffer.Position);
        stream.Write(span);
#endif
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    protected override (byte[] bytes, int nToCreate, int bookDataSize) Deserialize(Stream stream)
    {
        MemoryStream mem = new MemoryStream();
        // Since flatbuffers do not support memory streams we have to copy here
        stream.CopyTo(mem);
        byte[] data = mem.ToArray();
        var bookShelf = BookShelfFlat.GetRootAsBookShelfFlat(new ByteBuffer(data));

        // touch all fields as deserialize.
        for (int i = 0; i < bookShelf.BooksLength; i++)
        {
            var book = bookShelf.Books(i);
            if (book != null)
            {
                _ = book.Value.Title;
                _ = book.Value.Id;
                _ = book.Value.GetBookDataArray();
            }
        }

        return (data, 0, 0);
    }
}

// Program.cs
(byte[] bytes, int nToCreate, int bookDataSize) DataFlat(int nToCreate)
{
    byte[] bytes = CreateAndFillByteBuffer();
    return (bytes, nToCreate, BookDataSize);
}

void TouchFlat((byte[] bytes, int nToCreate, int bookDataSize) data, int nExpectedBooks, int optionalPayloadDataSize)
{
    return;
}

Thanks for the check of RegisterWeelknownTypesFOrmatters issue. If only a JIT is used, the generation can be completely delayed, but I have conservatively taken a larger pre-generator because I am concerned about AOT.

I could reduce it a bit more, but first, once I had not set the initial capacity of the ConcurrentDictionary for NonGenerics By setting that, it would be smaller than it is now. Thanks.

AloisKraus commented 1 year ago

In the serialize case you create temporary strings

$"Book {i}")

which is not compatible with the expectation that you already have some data which you want to serialize. As I said. Initially I had code like this but then Flatbuffer looks bad, because you create the objects in the serialize loop which is not the case for all other serializers which take an existing BookShelf object.

neuecc commented 1 year ago

FlatBufferBuilder.CreateString is serialize, should include in serialize method. https://github.com/google/flatbuffers/blob/master/net/FlatBuffers/FlatBufferBuilder.cs#L726-L738

Ok, $"Book {i}" should not inlucde in serialize method. Simply, create from outside and use it.

(byte[] bytes,string[] titles, int nToCreate, int bookDataSize) var title = builder.CreateString(titles[i]);

jamescourtney commented 1 year ago

I'm the author of Flatsharp, a C#-focused Flatbuffers implementation. Flatbuffers is an exceedingly lazy format. When doing my own internal benchmarking and comparing to other serializers, the metric I always use is "parse and traverse", since "parsing" a FlatBuffer is simply not fair to other serializers. Like @neuecc says, when serializing, create objects and buffers statically outside the method and just measure the effect of populating that buffer with the serialized data.

Alois-xx commented 1 year ago

I will come up with something. How about

protected abstract void Serialize(T obj, Stream stream, object context=null);

which will get as context a BookShelf which is then converted in the serialize call to the actual object. Then it should be comparable.

jamescourtney commented 1 year ago

I will come up with something. How about

protected abstract void Serialize(T obj, Stream stream, object context=null);

which will get as context a BookShelf which is then converted in the serialize call to the actual object. Then it should be comparable.

I think you will likely struggle to implement generic code that uses all the serializers effectively. Streams, for example, won't work with the Google Flatbuffer library because it writes data back-to-front. They won't work with FlatSharp because it jumps around when serializing (basically, it will leave "blank spots" and fill them in later). Not that your bench here uses FlatSharp, but those are the two libraries I'm most familiar with. I assume some of these only work with Streams :)

To my mind, it seems like the challenge is to balance making the testing code as generic as possible while still giving room for the individual implementations to work in the way that they do best. After all, this benchmark is only useful if it is showing fair data, so you can't let any implementations be handicapped for the sake of making the testing code more uniform.

I'm not familiar enough with all the serializers you include here to make sweeping recommendations (though I have pulled down your repo). I do have a few thoughts to share though:

Consider defining ISerializer instances for each of the constituent serializers. The shape of this might look something along the lines of:

// This is just a template that other serializers can translate into their own internal format. public class BookshelfCommon { public List Books { get; } }

public interface ISerializer { // Prepare provides an opportunity to translate BookshelfCommon into // a data structure generated by that serializer's precompiler and to allocate buffers/streams // for the serialization pass. void Prepare(BookShelfCommon template);

 // Writes the template from the Prepare method to the storage (buffer/stream)
 void Serialize();

 // Parses (and uses!) the data. In this case, you could consider returning the sum of the book IDs
 // since this returns a value, also provides the ability to validate and test that the traversal logic is implemented correctly.
 // This is a great way to force Lazy serializers like FlatBuffers to actually go through all the data.
 int ParseAndUse();

}


This could be made generic to support `LargeBookshelf` and others.

- This will necessarily result in more code from your end (especially for the traversal part), but since since all serializers are round tripping data based on the same template, you can write *tests* for that to validate the outputs and make sure that all implementations are at least superficially correct.

- Consider defining a `NoOpSerializer` that does nothing to give you a sense of the overhead of the testing infrastructure.

Alois-xx commented 1 year ago

Can you check out https://github.com/Alois-xx/SerializerTests/commit/7e0176b1e20c0f87eab8ea9906119ba64ce4079e? It uses for serialize a BookShelf object which is then converted to the flat buffer object. On deserialize I return the BookShelfFlat, but touch all object properties once. That should be now comparable.

D:\Source\git\SerializerTests\bin\Release\net70>SerializerTests.exe  -test combined -N 1000000 -serializer FlatBuffer`2,memorypack 
Serializer      Objects "Time to serialize in s"        "Time to deserialize in s"      "Size in bytes" FileVersion     Framework       ProjectHome     DataFormat      FormatDetails   Supports Versioning     Scenario
MemoryPack      1000000 0,0804  0,187   27888901        7.0.22.51805    .NET Core 7.0.0 https://github.com/Cysharp/MemoryPack   Binary          Yes
FlatBuffer      1000000 0,2233  0,141   39999676        1.12.0.0        .NET Core 7.0.0 https://google.github.io/flatbuffers/   Binary          Yes

Building the FlatBuffer object seems to be pretty expensive which might be related to not fully optimized code paths to convert strings to a byte array.

@jamescourtney: You are right that many serializers are more specialized and do not even support streams anymore. I could support several serialize/deserialize options to de/serialize to string, byte array, ... but the benchmark summary would become unreadable. No one would bother looking at such detailed metrics when he is just looking for some serializer comparisons.

Besides the complexity little value is gained for the public. I know you, as library author, have a different view on this, but the truth is that too few people care for such detailed metrics which might (at least not for me) be worth the effort.

neuecc commented 1 year ago

@Alois-xx thanks, seems good. In deserialize, it seems that BookFlat is struct is advantageous in terms of performance. This is not a problem in comparison, as it is a strength of the dedicated FlatBuffers type. Is PayloadDataSize not set? A non-zero value seems better.

If you don't mind, it would be great if you could bring MemoryPack up to date (1.8.9).

jamescourtney commented 1 year ago

Can you check out 7e0176b? It uses for serialize a BookShelf object which is then converted to the flat buffer object. On deserialize I return the BookShelfFlat, but touch all object properties once. That should be now comparable.

That change looks reasonable to me.

Besides the complexity little value is gained for the public. I know you, as library author, have a different view on this, but the truth is that too few people care for such detailed metrics which might (at least not for me) be worth the effort.

I think you may be misunderstanding me. I'm not suggesting having tons of metrics since I agree that would be less than useful. I'm just suggesting factoring the code in the benchmarks here to allow each serialization library to operate with its preferred data structures, whether that be streams, byte[]s, etc. If you're interested, I'm happy to do a PR to this effect.

jamescourtney commented 1 year ago

And to be clear, I know that that will lead to having more code in this library (but not tons more, I don't think), but you'll produce more correct results.

Alois-xx commented 1 year ago

@neuecc: Yes I will update to latest. And no by default the PayloadDataSize is 0. The test suite already runs for quite a long time to arrive at reasonable results.

@jamescourtney : I have a problem with different start/end states. If some serializer starts with an object and then ends with a string as serialized output it might skew results for other serializers which encode the data to an UTF-8 byte array and then skew the result further if the serializer writes the data into a memory stream.

The serializer which does all of this will loose but these steps will be necessary if you send the data over the write so it is better to include these costs here as well to stay comparable. The used framework which will take your serialized data should be able to handle both (strings/byte arrays/streams/System.IO.Pipelines) in an efficient way if high throughput is expected.

Alois-xx commented 1 year ago

Blog post has been updated and also a 10 byte buffer per Book object was used to check by how much the timings are changed.

Thanks for the input to improve the test suite @neuecc , @jamescourtney .

neuecc commented 1 year ago

Thanks, that makes sense.

I think GroBuf's Performance is effective in pre-calculating the size in Write. When the API returns a byte[], the size is calculated first and made in fixed length.

var size = writerAndCounter.Item2(obj, true, context);
var result = new byte[size];
/* write to byte[] */
return result

Since the size of the Payload has increased with this change, the byte[] copy time has increased, so the pre-size calculation is probably working in favor of the new size. The pre-size calculation is probably working in your favor.

The pre-size calculation is not universal, since it is computationally expensive depending on the binary specification, such as having to calculate the size of UTF8 for String. In GroBuf, the increase in the amount of calculation is suppressed by the specification to serialize as UTF16.

MemoryPack includes such optimizations only if they are statically determined at compile time.

Such a 2-pass process can produce poor results depending on the data, so I did not adopt the results qualitatively, but It seems that the results are better than I thought. There seems to be a little more room to think about optimization, thanks.

jamescourtney commented 1 year ago

Looking at the updated version of the FlatBuffer test suite, my read of the code is that the only thing that's being benchmarked is .NET's array allocation and memory copy performance:

            // Allocate array 1 and a MemoryStream object
            MemoryStream mem = new MemoryStream();

            // Allocate a new array (expanding the memory stream) and do a memory copy
            stream.CopyTo(mem);

            // Allocate a new array and do another mem copy
            byte[] data = mem.ToArray();

This involves allocating at least two (likely 3) arrays and doing at least 2 memcopy operations in each iteration, since the MemoryStream may resize. This is an enormous amount of busywork in each iteration. More importantly, no one who uses FlatBuffers would do it like this. While the other serializers might be "stream friendly" natively, my suggestion is just to remove FlatBuffers altogether if you're unwilling to part with streams.

Could you perhaps explain why streams are necessary? Are you looking to compare file sizes? Generally, it's trivial to wrap a byte[] in a Stream, but difficult to do the reverse.

jamescourtney commented 1 year ago

Looking at the updated version of the FlatBuffer test suite, my read of the code is that the only thing that's being benchmarked is .NET's array allocation and memory copy performance:
            // Allocate array 1 and a MemoryStream object
            MemoryStream mem = new MemoryStream();

            // Allocate a new array (expanding the memory stream) and do a memory copy
            stream.CopyTo(mem);

            // Allocate a new array and do another mem copy
            byte[] data = mem.ToArray();
This involves allocating at least two (likely 3) arrays and doing at least 2 memcopy operations in each iteration, since the MemoryStream may resize. This is an enormous amount of busywork in each iteration. More importantly, no one who uses FlatBuffers would do it like this. While the other serializers might be "stream friendly" natively, my suggestion is just to remove FlatBuffers altogether if you're unwilling to part with streams.

Could you perhaps explain why streams are necessary? Are you looking to compare file sizes? Generally, it's trivial to wrap a byte[] in a Stream, but difficult to do the reverse.

I took a stab at fixing this, but it's difficult with the official FlatBuffer library since it writes from Right to Left (basically the opposite of streams). We'd probably need to switch to arrays or something to remove those copies.

Alois-xx / SerializerTests

FlatBuffers serialization timing #25