eishay / jvm-serializers

Benchmark comparing serialization libraries on the JVM
http://groups.google.com/group/java-serialization-benchmarking
3.29k stars 562 forks source link

New JSON serializer based on the dsl-client-java library #47

Closed hperadin closed 9 years ago

hperadin commented 9 years ago

We would like to include a new JSON seralizer based on the dsl-client-java library (https://github.com/ngs-doo/dsl-client-java).

According to our benchmarks, it seems to outperform all other text-based serializers:

http://hperadin.github.io/jvm-serializers-report/ http://hperadin.github.io/jvm-serializers-report/report.html

The serializer is schema-based (https://github.com/hperadin/jvm-serializers/blob/master/tpc/schema/media.dsl). We've included the schema and the generated sources, and omitted the code generation step. If needed, we can add it in another pull request.

cakoose commented 9 years ago

Is your code generator written in Java? If so, we'd prefer that the generated files weren't checked in. Instead, run the code generator as part of the build.

Look at the Makefile to see some examples. I can help if you get stuck.

zapov commented 9 years ago

Sorry, our compiler is written in .NET (runs on Mono). We have clc client (written in Java) which interacts with it via .NET/Mono. So we can include it in the build, but you will have Mono dependency then.

So it seemed easier to include classes as is.

cakoose commented 9 years ago

In that case, please follow the pattern used for Thrift and Protobuf.

Let me know if you run into any issues.

hperadin commented 9 years ago

I've updated the Makefile and tested the build.

The code generator assumes a fixed directory structure, and downloads some temporary static dependencies, so the new Makefile code handles the setup and cleanup.

The offline compiler (https://github.com/hperadin/jvm-serializers/tree/master/tpc/lib/dsl-compiler.exe) requires Mono to run. This was tested on version 3.10.0.

hperadin commented 9 years ago

The mono executable is now unversioned (will be a part of downloaded deps), and all downloaded dependencies are put under a temporary folder inside build/. I've also refactored the pregen directory to reflect the package structure.

cowtowncoder commented 9 years ago

One quick question: looking at results, it looks like output size is bit smaller (437 bytes) than for other JSON serializers (468 / 485 bytes). I was wondering what could cause this: no serializer adds indentation, so that should not cause difference. I am guessing this could be due to one of two things: either one of properties might be unintentionally missing; or serializer does not use similar structure as others do. It would be good to know, to either fix the problem (if any), or maybe document difference, if structure differed. Since difference is quite small, it could just be a single missing property -- if structure was different, or names were shortened, I'd expect difference to be more significant.

zapov commented 9 years ago

Or it could be due to third option :)

We've submited it with minimal = true which omits default values (simmilar to Jackson ctor defaults). In terms of speed I dont expect minimal = false to be much different (we actually didnt test for that).

cowtowncoder commented 9 years ago

Ok, so it is probably omitting just a one or two properties since value is default. So not accidental dropping, but using class default settings.

It'd be nice to see if size would agree with other json ones (I realize that there is discrepancy already, I forget which property it was that caused it) with minimal = false, just for sake of completeness. You are probably right that it won't make much difference to performance numbers.

zapov commented 9 years ago

Did you mean that in terms to check if json is correct or to add one more serialization so we have:

We didn't want to clutter the results, but I guess some might find it interesting to see both serialization paths in bench.

cowtowncoder commented 9 years ago

There are multiple ways to go; I was first thinking of just seeing how size and speed differ with full settings; and if there is no significant difference, use that setting to be as similar to existing ones as possible. But if there is difference, showing both would make sense -- many other codecs already have multiple modes. Naming of tests is another open question (since while this is close to databind, schema-generation and other aspects sound slightly different; but it is not manual or tree either but more automated), but that can be tackled later on.

So I think minimal and full sound good to me, if results differ. If not, I would suggest using full set just to try to be as much apples to apples as possible. Or, if you prefer, just adding brief notes of which properties are actually ignored due to default values. I'd be fine with that as well.