Consider formats which support random value access

eishay / jvm-serializers

Benchmark comparing serialization libraries on the JVM

http://groups.google.com/group/java-serialization-benchmarking

3.29k stars 561 forks source link

Consider formats which support random value access #76

Open mzaks opened 7 years ago

mzaks commented 7 years ago

Hi there, first of all thank you for the great benchmark. It is a very nice resource. I wanted to point out that format which support random value access like Cap'nProto and FlatBuffers are measured a bit unfair. The deserialise methods of the test make a copy of every value encoded in the buffer even though it is not necessary. The buffers are serialised in a way that make them accessible on demand, without the need of actual unmarshaling. In this gist you can find a rewritten test for FlatBuffers: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531 Which passes and returns following result for decoding: average:5.92301542549922ms deviation:0.6135154254992203ms Formats which invest in supporting random value access do it at cost of size and encoding speed. This is why I find it unfair to show the shortcomings of encoding and go around the benefits of the decoding.

cakoose commented 7 years ago

Here are two use cases people might have:

Read all the data in a serialized value.
Read only some of the data in a serialized value.

I think #1 is the more common use case and that's what this benchmark attempts to measure.

A long time ago, the benchmark also tried to measure #2. The timeline, as I remember it:

There's a library that supports the Protobuf wire format and only lazily parses the message as fields are accessed.
This library was added to the benchmark and had a hilariously low "deserialization time" measurement.
To improve the situation, we started keeping track of two measurements:
- "full deserialization time" -- read every value (which forces the lazy deserializer to do all the work)
- "partial deserialization time" -- only read a few fields

This was around for a while (maybe a year?) but there was some discussion on the mailing list and we decided to drop the "partial" measurement. The thinking that the benefits weren't worth the extra maintenance burden and extra real estate on the results page for a use case that was somewhat specialized.

On Wed, Jun 14, 2017 at 1:31 PM, Maxim Zaks notifications@github.com wrote:

Hi there, first of all thank you for the great benchmark. It is a very nice resource. I wanted to point out that format which support random value access like Cap'nProto and FlatBuffers are measured a bit unfair. The deserialise methods of the test make a copy of every value encoded in the buffer even though it is not necessary. The buffers are serialised in a way that make them accessible on demand, without the need of actual unmarshaling. In this gist you can find a rewritten test for FlatBuffers: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531 Which passes and returns following result for decoding: average:5.92301542549922ms deviation:0.6135154254992203ms Formats which invest in supporting random value access do it at cost of size and encoding speed. This is why I find it unfair to show the shortcomings of encoding and go around the benefits of the decoding.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/eishay/jvm-serializers/issues/76, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOpdfBWd1bpJa59dBFbmVI-QwDcwOfPks5sEBkBgaJpZM4N6NAf .

cowtowncoder commented 7 years ago

I think that it would be quite difficult to fairly measure even just these 2 variations across formats and libraries. So I think that while test for "partial binding" (or whatever it'd be called) can be valuable, it may make most sense as separate project, focused on such usage.

mzaks commented 7 years ago

Well I guess this is the common misconception which people have towards random value access feature. It is not a lazy parsing feature, where the data is transformed only on demand. It is rather read access friendly representation of data. This line MediaContent.getRootAsMediaContent(bb, mc); Turns MediaContent mc = new MediaContent(); into read only accessor of received data. As buffer has its own local references stored in a virtual table there is no actual parsing needed. The getters of MediaContent just do some addition and subtraction of relative offsets, to get the absolute buffer offset, but this is similar to what a runtime does when we want to access a property of an object. OK there is also the conversion of the value at the offset to a type by using java.nio.ByteBuffer class, but this hardly can be called as parsing. Comparing to other formats, where the values are stored as text, which actually has to be parsed, or in a format where we need to perform a linear traversal to find the value, and in this case it rather better to transform everything in one go. It is, as if we would compare balanced binary search tree to a linked list.

So what I find a bit unfair in the current implementation of the test, is the fact that serializers.flatbuffers.media.MediaContent in deserialize method is converted to data.media.MediaContent and than JavaBuiltIn.mediaTransformer is used to check the values. In suggestion I posted as gist, deserialize method returns serializers.flatbuffers.media.MediaContent and I implemented static final class Transformer extends MediaTransformer<MediaContent> which knows how to forward and reverse data given data.media.MediaContent and serializers.flatbuffers.media.MediaContent. As I don't have a complete understanding of the test I might miss something, but I guess by having an appropriate Transformer we do not even change the semantics of the test. It is still reading all the values from the buffer, it just does not convert it into unnecessary intermediate Java objects which has to be allocated and than garbage collected.

The only use case where using serializers.flatbuffers.media.MediaContent directly would not be an option is if we would like to mutate the values. As I mentioned before it is a read only accessor. But as far as I can tell this is not a use case in this test.

So I don't plead for making a test for partial access of data, I just want to avoid intermediate Java objects.

cakoose commented 7 years ago

Oh, I think I understand your concern now.

The way the benchmark is set up, every serializer's test creates those additional Java objects. For example, even the Java built-in serialization test, which uses data.media.MediaContent as its native format, makes a copy:

https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java

The purpose of this is to measure the overhead of reading every field so serializers with expensive accessors are penalized appropriately.

Unfortunately, as you observed, this also includes the overhead of allocating, writing, and GC'ing the additional objects, which isn't ideal. However, I think it's still sort of "fair" in that all serializers pay this same price.

Like you said, we could remove the overhead by writing custom code for each set of serializer classes:

A function to every field and blackhole http://javadox.com/org.openjdk.jmh/jmh-core/1.6.3/org/openjdk/jmh/infra/Blackhole.html it.
A function to compare against a data.media.MediaContent value.

The reason we didn't do that was the additional maintenance burden. We would not only have to write those additional functions for each set of serializer classes, we'd have to be vigilant to ensure the code was correct (e.g. make sure it wasn't skipping a field).

With our current setup, each set of serializer classes only needs a "Transformer" implementation. If the transformer implementation is incorrect, we'll probably detect that because the round-trip comparison will fail.

(Really sorry if I'm still misunderstanding something! I don't have time to look into the code in detail right now.)

On Fri, Jun 16, 2017 at 6:15 AM, Maxim Zaks notifications@github.com wrote:

Well I guess this is the common misconception which people have towards random value access feature. It is not a lazy parsing feature, where the data is transformed only on demand. It is rather read access friendly representation of data. This line MediaContent.getRootAsMediaContent(bb, mc); Turns MediaContent mc = new MediaContent(); into read only accessor of received data. As buffer has its own local references stored in a virtual table there is no actual parsing needed. The getters of MediaContent just do some addition and subtraction of relative offsets, to get the absolute buffer offset, but this is similar to what a runtime does when we want to access a property of an object. OK there is also the conversion of the value at the offset to a type by using java.nio.ByteBuffer class, but this hardly can be called as parsing. Comparing to other formats, where the values are stored as text, which actually has to be parsed, or in a format where we need to perform a linear traversal to find the value, and in this case it rather better to transform everything in one go. It is, as if we would compare balanced binary search tree to a linked list.

So what I find a bit unfair in the current implementation of the test, is the fact that serializers.flatbuffers.media.MediaContent in deserialize method is converted to data.media.MediaContent and than JavaBuiltIn.mediaTransformer is used to check the values. In suggestion I posted as gist, deserialize method returns serializers.flatbuffers.media. MediaContent and I implemented static final class Transformer extends MediaTransformer which knows how to forward and reverse data given data.media.MediaContent and serializers.flatbuffers.media. MediaContent. As I don't have a complete understanding of the test I might miss something, but I guess by having an appropriate Transformer we do not even change the semantics of the test. It is still reading all the values from the buffer, it just does not convert it into unnecessary intermediate Java objects which has to be allocated and than garbage collected.

The only use case where using serializers.flatbuffers.media.MediaContent directly would not be an option is if we would like to mutate the values. As I mentioned before it is a read only accessor. But as far as I can tell this is not a use case in this test.

So I don't plead for making a test for partial access of data, I just want to avoid intermediate Java objects.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eishay/jvm-serializers/issues/76#issuecomment-308989039, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOpdVXH5CnTtFSJDcbOr1pz0jfJSIy0ks5sElW8gaJpZM4N6NAf .

mzaks commented 7 years ago

Some of the test do have there custom transformers, like for example the winner of the benchmark 😉 https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/colfer/Colfer.java#L64

So than we are back at the "unfair" point 🙂.

cowtowncoder commented 7 years ago

@mzaks be that as may, I do not see point in trying to spend huge amounts of effort to make test more favorable for certain kinds of decoders. Use case being tested is a common "decode the whole message" use case, and for that things work.

If you want to create separate set of tests, or perhaps new project, that can be valuable. But I see low effort/value ratio for modifying existing tests.

cakoose commented 7 years ago

Ah, dammit. Took a closer look at the code and I think I (finally) understand what you're saying, Maxim.

Whoever wrote the original FlatBuffers test accidentally made the serializer do the transformation work as well. None of the tests should be doing this. Sorry for taking so long to realize this; I think I narrowed in too quickly on your "random access serializer" comment and thought you were talking about a different issue.

It looks like your diff just moves the transformation code into a Transformer, where it belongs. That seems fine with me. Do you think you can clean that code up and submit a pull request?

Also, a minor suggestion: if your original message came with a diff then I might have more quickly realized what you were talking about. Because it was a full file, I was lazy about actually figuring out what the diff was. (Still my fault for jumping to the wrong conclusion.)

Tatu: would appreciate if you could double-check my evaluation to make sure I'm not misunderstanding yet again :-P

On Fri, Jun 16, 2017 at 10:58 AM, Maxim Zaks notifications@github.com wrote:

Some of the test do have there custom transformers, like for example the winner of the benchmark 😉 https://github.com/eishay/jvm-serializers/blob/master/tpc/ src/serializers/colfer/Colfer.java#L64

So than we are back at the "unfair" point 🙂.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eishay/jvm-serializers/issues/76#issuecomment-309093301, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOpdQgbXMZOxsgtx-ZF4DaIPuoGp01Eks5sEsIxgaJpZM4N6NAf .

mzaks commented 7 years ago

Will do. I also saw that there is a new version of FlatBuffers 1.7.0 will also update it in the PR.

zapov commented 7 years ago

I wrote the original flatbuffer submission and while its possible that I did not write an optimal implementation I have issues with @mzaks gist. I did not write any transformer, because there was no POJO class to use for such purpose. This implementation creates two cached instances: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L40 and https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L112 which seems not aligned with other implementation. While it's cool that flatbuffers can access field by lookup instead of parsing, it's more likely that it won't be much faster when it needs to create garbage like everyone else. By reusing those instances it avoids doing the work everyone else is doing (there are other serializers which can deserialize into an instance)

cakoose commented 7 years ago

zapov, thanks pointing out the reuse. I agree that our benchmarks shouldn't reuse message objects.

(Though some real-world users may reuse objects for performance, I think we decided that this benchmark should measure the more common/simple use case.)

But I think the other issue Maxim brought up is legitimate -- the current FlatBuffers benchmark is doing extra work. I think we should switch FBSerializer to return FlatBuffer MediaContent objects and write an FBTransformer to convert between that and the POJO.

(I tried making the change myself, but ran into an NPE and gave up since I'm not familiar with FlatBuffers :-P)

On Sun, Jun 25, 2017 at 4:37 PM, zapov notifications@github.com wrote:

I wrote the original flatbuffer submission and while its possible that I did not write an optimal implementation I have issues with @mzaks https://github.com/mzaks gist. I did not write any transformer, because there was no POJO class to use for such purpose. This implementation creates two cached instances: https://gist.github.com/mzaks/a0dd3d68f8958da720680227494695 31#file-flatbuffers-java-L40 and https://gist.github.com/mzaks/a0dd3d68f8958da720680227494695 31#file-flatbuffers-java-L112 which seems not aligned with other implementation. While it's cool that flatbuffers can access field by lookup instead of parsing, it's more likely that it won't be much faster when it needs to create garbage like everyone else. By reusing those instances it avoids doing the work everyone else is doing (there are other serializers which can deserialize into an instance)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eishay/jvm-serializers/issues/76#issuecomment-310936096, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOpdX9xq7cysvIWAEpEk2fPOygTqDSkks5sHu8xgaJpZM4N6NAf .

zapov commented 7 years ago

I don't remember the actual reason why I wrote it like that instead of using a transformer. I vaguely remember something about not being able to convert it (but of course it's possible that I just didn't know how to do it). I'm fine with the transformer as long as there are no stuff like this: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L113

Since this bench tests for a single object and this optimization is not part of the library I don't think stuff like that are "fair" to other codecs.

On an unrelated note, as far as the improvements to the bench goes, I think it would be worth to try and get numbers without the copy in forward/reverse: https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java#L93 If that was the case current implementation would be fair to Flatbuffers in the same manner as to all other codecs with custom types. While on the surface it might seem not fair to the others since they have to create more garbage for conversion into provided POJO, to me it seems that's what the average Java dev expects: "I have my class, how fast can I send it around and get it back".

If we made that change I would drop the current dsl-json implementations and just put the databinding one in, as thats the most common use case.

mzaks commented 7 years ago

Here is a cleaned up gist where I removed all the "smart" techniques. https://gist.github.com/mzaks/3e432df395fa343847701b8735a706b4

I would like to discuss if those techniques are "fair" or not though.

With a binary serialisation strategy we have always two parts:

The actual binary representation.
Implementation how this binary representation can be read and written.

FlatBuffers binary format allows user to do this: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L113 Because strings are stored by reference and not in place. Meaning that I can perform de-duplication when I serialise. And skip decoding of the string from binary if I see that multiple table point to the same string. Why do the work twice? This behaviour is sadly not supported by the implementation, but can be easily added on top by the user. So the binary representation supports it, but the technique is offloaded to the user and is not par of the generated code.

Same applies to: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L112 https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L40

As a matter of fact, the generated API encourage user to reuse the table accessor classes: https://github.com/eishay/jvm-serializers/blob/master/tpc/pregen/media.fbs/serializers/flatbuffers/media/MediaContent.java#L13

Those accessor classes are just convenience to store two things:

Reference to ByteBuffer
Offset in the byte buffer

It would be possible to avoid them all together and just provide pure static methods which receive reference to byte buffer and offset, and return the value or reference to a table. This way we could avoid class instantiation and garbage collection all together.

Again this is a case of binary representation supporting something that the implementation does not provide.

Does restricting a driver to use only 3 gears on his 5 gears vehicle, because other drivers drive vehicles with only 3 gears fair? Hard to say 😉.

zapov commented 7 years ago

If you are returning an instance to the outside I don't think you should be holding on that instance and using it for next request. Meaning I think it's fine to keep a cache of flatbuffer instances which you deserialize into, but I don't think it's fine to keep a cache of the data.media objects

mzaks commented 7 years ago

Yes absolutely. The "caches" are meant only for sequential deserialisation in the same "layer". If the instance goes to a different layer of the application, it should not be reused. Same applies for multi threaded scenario. In the test as far as I can tell the decoding is done sequentially, so there is no need to create new instances on every iteration. Also totally agree on data.media objects.

Let's consider a real world scenario. Our endpoint get requests with a payload, where we need to read the values and spin of processes, which will result in a response. In this case the payload is processed sequentially in one layer. Only the values will leave this layer. And this is where FlatBuffers shines - extracting of the values, specially if we need only a portion of the data.

cakoose commented 7 years ago

We try to measure the performance that users will typically encounter. For example, I would guess that most users will not bother maintaining a string dedupe table, so our standard flatbuffers benchmark shouldn't include that optimization. (But maybe my guess is wrong here.)

For some serializers, we run an additional benchmark where we apply additional optimizations (e.g. "flatbuffers-opt"). We used to have more of these, but have generally tried to reduce the number of benchmarks because the results page is already overwhelming.

On Tue, Jun 27, 2017 at 2:59 AM, Maxim Zaks notifications@github.com wrote:

Yes absolutely. The "caches" are meant only for sequential deserialisation in the same "layer". If the instance goes to a different layer of the application, it should not be reused. Same applies for multi threaded scenario. In the test as far as I can tell the decoding is done sequentially, so there is no need to create new instances on every iteration. Also totally agree on data.media objects.

Let's consider a real world scenario. Our endpoint get requests with a payload, where we need to read the values and spin of processes, which will result in a response. In this case the payload is processed sequentially in one layer. Only the values will leave this layer. And this is where FlatBuffers shines - extracting of the values, specially if we need only a portion of the data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eishay/jvm-serializers/issues/76#issuecomment-311312188, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOpdRfrnXDT4rbeRfe3LCxGx8VxwpaNks5sINKDgaJpZM4N6NAf .