eishay / jvm-serializers

Benchmark comparing serialization libraries on the JVM
http://groups.google.com/group/java-serialization-benchmarking
3.28k stars 560 forks source link

Latest DSL Platform library. Minified JSON version. #59

Closed zapov closed 8 years ago

zapov commented 8 years ago

Changed omit-defaults to minified version. Instead of just omiting default values, use shorter names for properties.

While DSL Platform could create serializers for built-in POJO objects, it seems that doesn't fit very well in the build process. Switch from dsl-client library to standalone dsl-json library. Updated dsl-clc to latest version.

Instead of https://github.com/eishay/jvm-serializers/pull/51 Any reason why that was not merged?

btw. on java 1.8.66 I get

create ser deser total size +dfl
protostuff 151 803 1213 2016 239 150
json/dsl-platform/minified 90 903 1381 2284 353 197
kryo-flat-pre 119 1074 1320 2395 212 132
fst-flat-pre 117 994 1487 2481 251 165
json/dsl-platform 88 1122 1636 2758 485 261

Regards, Rikard

cowtowncoder commented 8 years ago

I don't know why it hasn't been merged, but I would suggest that shortening property names is not the right thing to do -- all textual formats use long names at this point (original test actually used short, 2-character names, but that was changed when it was felt that this is not what is typically done), and it seems appropriate this codec should not diverge from that.

zapov commented 8 years ago

I've changed previous submission (defaults omitted) to this minified version for several reasons:

That said, I have some other issues with this benchmark. If we fix them, other codecs (such as SBE, FlatBuf and CNP) could be included as well:

Those changes would reduce amount of garbage created by the bench and provide more realistic results.

cowtowncoder commented 8 years ago

@zapov If minified version is an alternative to use of full names, that's ok. I just assumed it would replace the default version.

As to use of byte[], there really is no good answer there. Many codecs would perform bit better with InputStream or OutputStream, for example, but not all support that.

These discussions make more sense on the mailing list; I just thought I'll add some notes here.

zapov commented 8 years ago

I'm all for discussion on improvements to this bench on the mailing list, just want to solve this PR first, so it doesn't linger for months here again ;)

cowtowncoder commented 8 years ago

@zapov Understood, point taken. Since no one else has moved on this, I'll have a look. Thank you for your patience AND persistence. :)

cakoose commented 8 years ago

Though people minify Javascript often, I didn't think it was common to minify JSON-serialized values. I guess using a minifier would work if you only need to serialize/deserialize within a single version of your software. But how would it work if you're communicating with other services or persisting data that needs to be loaded by a future version of your software?

I'm generally not in favor of using abbreviated field names because I feel like it very rarely occurs in practice. And unless there's some kind of automatic translation between the short and long names, it would hurt readability of the code as well.

zapov commented 8 years ago

Not sure if I'm understanding you correctly. I did not minify values (eg, you can still find full enum string values in the JSON), nor property names. I only minified property aliases in JSON. This (https://github.com/eishay/jvm-serializers/blob/master/tpc/schema/media.dsl#L56) is equivalent to writing

@JsonProperty("f")
public String getFormat()

in Jackson. It's almost the same as codecs requiring explicit ordering (name aliases). And I think there are lot's of use cases for that, although using public API is not one of them.

cakoose commented 8 years ago

For some reason I read "minification" as "Javascript minification", sorry. What you did makes much more sense to me now.

Two more things, though:

(Just food for thought. I'm not the authority here or anything.)

cowtowncoder commented 8 years ago

I think naming convention also has case of "manual", which indicates that special tuning (or piles of code) are used to provide for more optimized performance. I agree in that some modifier is needed to clearly distinguish results of special case, specific modifier to use does not matter as much as results are meant to be developer-readable, meaning readers do need to interpret meaningfulness of results anyway.

zapov commented 8 years ago

If it will reduce confusion and improve results differentiation I'm ok with name change to minified-json/dsl-platform

I don't think abbrev-json/dsl-platform is a good name since two optimizations were used in that test case: name shortening and omitting defaults.

Not sure if I would move that to "manual" since by that logic all shemas which require explicit ordering would fall under same category. Also, this is just an example of how to use it - exact feature is to change the name of the property to an explicit alias. I could use some different schema definition and write something along the lines of

value Image {
  String? title;
  String format;
  serialization { minify; }
}

which would resulted in same/similar result.

Also, the idea is to use this as a compiled serialization on an existing POJOs. I actually wanted to do that, but first couldn't get the build to run, since it tries to compile generated java files as a standalone (plus dependencies), but actual model files are not in external jar, but rather in source files. Also I didn't want to add annotation processor into the build, since compiler requires Mono/.NET which doesn't really fit with the project ;)

We can address that issue later ;) I'll make PR with name change soon.