Findwise / Hydra

Distributed processing framework for search solutions
http://findwise.github.io/Hydra
Other
81 stars 47 forks source link

API: Gson -> Kryo (WAS: JSON -> BSON) #6

Open jwestberg opened 12 years ago

jwestberg commented 12 years ago

Currently, JSON is used for all serialization between stages, while Bson is used for communication with MongoDB.

When the API was created, Bson was not available separately in any reliable format, just within the Mongo Java Driver. This is now distributed separately, and an upgrade from JSON to BSON can thus be achieved. This also allows for improvements to handle binary data and less hacking required to provide Date objects.

jwestberg commented 12 years ago

Some benchmarks from the jvm-serializers project suggests that the gain here would be good when thinking about bandwidth, but there is no real gain in space utilization (see this image).

Upgrading to Bson is problematic as well due to the "DEV MODE" implementation in the AbstractStage class, as well as other simple tools that allow reading of JSON from the command line or from file.

Let's instead explore a simple upgrade from Gson to Kryo, since that appears to be the most efficient JSON serialization library.