Junfeic / kryo

Automatically exported from code.google.com/p/kryo
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Add support for serialization of very deeply nested object graphs #103

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently Kryo and all other serialization frameworks (AFAIK) have problems 
with serialization of deeply nested object graphs. Basically, since 
serialization is implemented by a recursive walk over the object graph, it 
results in a StackOverflow errors if the depth of the object graph is too deep.

Proposal: Add support for serialization of object graphs of any depth.

Proposed solution: 

Try to replace recursion with continuation passing style approach. In fact, it 
turned out that doing so is pretty straight forward.

This is a sketch of the solution:
- Every time you need to output a subobject inside your serializer, push first 
the work-item with a continuation for the current serializer, i.e. what it 
still to be done after sub-object will be serialized.
- Then push a work item for the sub-object
- return from a serializer

And in Kryo main methods, i.e. writeXXX and readXXX, basically iterate over the 
list of work items (containing continuations) until it is not empty.

I hacked the whole thing in only one day! :-) It is a rather big change, 
because it touches all serializers that may write sub-objects. Plus it changes 
a bit the main logic of Kryo, by replacing recursive calls of 
Serializer.write() and Kryo.write() by a working queue approach and use of 
continuations.  

The surprising bit is that it ... actually works perfectly. All tests are still 
green. Plus I have a few new tests:
- One test outputs a data structure with a nesting level of 8000000! No 
problems. Takes just 2 seconds on my machine.
- Another test tries to output a data structure representing an endless nesting 
level, i.e. it runs for ever. The aim of this test is to see if we hit any 
problems with GC, memory allocation, etc. By running this test I see that Kryo 
behaves very nicely. It can run for hours, without overflowing anything. 

Performance-wise, it looks like this new approach does not introduce any 
significant overhead if at all. At least I cannot see that things are getting 
visibly slower on my tests.  

So, with this thing we can now serialize just anything. Only the size of your 
heap and your HDD are the limits :-)

@Nate: Since the change is pretty big and to some extent changes the way how 
Kryo operates, how we should proceed?

Original issue reported on code.google.com by romixlev on 9 Mar 2013 at 10:37

GoogleCodeExporter commented 9 years ago
Please find attached the patch against current trunk implementing this feature. 
It is not very polished yet, but it is functional. All tests are green. There 
is a new unit test for deep nesting. Have a look at the collections serializers 
and com.esotericsoftware.kryo.continuations subpackages to get an idea about 
how to write serializers using this style.

Any comments and feedback is welcome!

-Leo

Original comment by romixlev on 12 Mar 2013 at 9:02

Attachments:

GoogleCodeExporter commented 9 years ago
Please find attached a newer version of the patch based on the current trunk 
(r396).
This version fixes some bugs found in the original patch and contains some 
performance improvements. 

Continuation-based version passes all unit tests. It also adds a dedicated unit 
test called DeepNestingSerializationTest. It shows how very deeply nested 
structures (or even endless data structures) can be easily handled by 
continuations-based Kryo.

Serializers and Kryo provide now methods setSupportsContinuations and 
getSupportsContinuations. If this flag is set, then continuations-based 
approach is used. Otherwise, the usual Kryo semantics is used.

Changes in Kryo class are now implemented by changing this class. In principle, 
one could implement a derived class ContinuationsBasedKryo which would overload 
some of Kryo's methods.

Changes in serializers affect read/write methods. Here one could also implement 
serialization specific versions of those classes by deriving from them and 
overloading read/write methods.

Such an implementation based on derived classes could be more modular and more 
orthogonal to the standard Kryo implementation.

-Leo

Original comment by romixlev on 22 Aug 2013 at 5:59

Attachments:

GoogleCodeExporter commented 9 years ago
Sounds really cool. How is the before and after performance with the JVM 
serializers project?

Original comment by nathan.s...@gmail.com on 22 Aug 2013 at 6:48