Problem with deflate - wrong result?

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Run the attached program

What is the expected output? What do you see instead?
Expected: Same hash code for original object and de-serialized object
Actual: Different hash codes

What version of the Kryo are you using?
1.01

Please provide any additional information below:
The application works (although the serialized size is mucgh larger than
for Java serialization) when no compression is used.
All the deserialized HashSets seem to be empty when compression is used.

-------------------------------------------

import com.esotericsoftware.kryo.Kryo;
import com.esotericsoftware.kryo.compress.DeflateCompressor;
import com.esotericsoftware.kryo.serialize.FieldSerializer;

import java.io.*;
import java.nio.ByteBuffer;
import java.util.*;

public class Serializer {

    public static void close(final Closeable closeable) {
        if (closeable != null)
            try {
                closeable.close();
            } catch (IOException e) {
                // ignore
            }
    }

    public static byte[] serialize(final Serializable serializable)
            throws IOException {
        ByteArrayOutputStream os = null;
        BufferedOutputStream bos = null;
        ObjectOutputStream oos = null;
        try {
            os = new ByteArrayOutputStream();
            bos = new BufferedOutputStream(os);
            oos = new ObjectOutputStream(bos);
            oos.writeObject(serializable);
            oos.flush();
            bos.flush();
            os.flush();
            return os.toByteArray();
        } finally {
            CloseHelper.close(oos);
            close(bos);
            close(os);
        }
    }

    public static Serializable deserialize(final byte[] serialized)
            throws IOException {
        ByteArrayInputStream bais = null;
        BufferedInputStream bis = null;
        ObjectInputStream ois = null;
        try {
            bais = new ByteArrayInputStream(serialized);
            bis = new BufferedInputStream(bais);
            ois = new ObjectInputStream(bis);
            try {
                return (Serializable) ois.readObject();
            } catch (ClassNotFoundException e) {
                throw new IllegalStateException("Failed to create object!", e);
            }
        } finally {
            close(ois);
            close(bis);
            close(bais);
        }
    }

    private static Serializable buildStructure(int size) {
        Random random = new Random(123456789);
        ArrayList<Set<Object>> objects = new ArrayList<Set<Object>>();
        for (int i = 0; i < size; i++) {
            Set<Object> set = new HashSet<Object>();
            // Add some data
            final int numberOfIntegers = random.nextInt(5);
            for (int integers = 0; integers < numberOfIntegers; integers++)
                set.add(random.nextInt(10000));
            final int numberOfDoubles = random.nextInt(5);
            for (int doubles = 0; doubles < numberOfDoubles; doubles++)
                set.add(random.nextDouble());
            // Add some references
            final int numberOfReferences =
random.nextInt(Math.min(random.nextInt(5) + 1, Math.max(objects.size(), 1)));
            for (int references = 0; references < numberOfReferences;
references++)
                if (objects.size() > 0) {
                    set.add(objects.get(random.nextInt(objects.size())));
                } else {
                    set.add(objects);
                }
            objects.add(set);
        }
        return objects;
    }

    public static void main(String[] args) {
        final Serializable object = buildStructure(10000);
        Kryo kryo = new Kryo();
        kryo.register(HashSet.class, new DeflateCompressor(new
FieldSerializer(kryo, HashSet.class)));
        kryo.register(ArrayList.class);
        ByteBuffer bb = ByteBuffer.allocate(10000000);
        long start = System.currentTimeMillis();
        kryo.writeObject(bb, object);
        bb.flip();
        System.out.println("Kryo size = " + bb.remaining());
        Object result = kryo.readObject(bb, ArrayList.class);
        long elapsedKryo = System.currentTimeMillis() - start;
        System.out.println("Original hashCode = " + object.hashCode() + ",
Kryo result hashCode = " + result.hashCode() + ", elapsed = " + elapsedKryo);
        result = null;
        try {
            start = System.currentTimeMillis();
            byte[] serialized = serialize(object);
            System.out.println("Java Size = " + serialized.length);
            result = deserialize(serialized);
            long elapsedJava = System.currentTimeMillis() - start;
            System.out.println("Original hashCode = " + object.hashCode() +
", Java result hashCode = " + result.hashCode() + ", elapsed = " +
elapsedJava);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Original issue reported on code.google.com by javafan...@gmail.com on 11 May 2010 at 6:44

GoogleCodeExporter commented 9 years ago

Thanks for the report.

HashSet is a Collection, so normally uses CollectionSerializer. When you do your
compression, you register HashSet with a FieldSerializer. HashSet stores its 
data in
transient fields, so the data is not serialized. This explains why your 
HashSets are
empty.

When HashSet is registered with CollectionSerializer (which also happens if
registered without specifying a serializer), then it works properly but the
serialized bytes are very large compared to Java's built-in serialization. The 
reason
is that your test puts many of the same objects in the graph more than once. By
default Kryo doesn't handle references. Normally this is done using
ReferenceFieldSerializer, but this mechanism can only handle types that would
otherwise use FieldSerializer. It doesn't handle primitive wrapper references,
Collections, etc. If you take the references out of your test, you will see an
efficient output size.

Kryo needs better references support. I'll update this issue when this is 
implemented.

Further optimizations could be done for this test case by more efficient 
handling of
Doubles. Right now they are always 8 bytes.

Original comment by nathan.s...@gmail.com on 11 May 2010 at 6:06

GoogleCodeExporter commented 9 years ago

Original comment by nathan.s...@gmail.com on 11 May 2010 at 6:07

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Thanks for the quick reply Nathan!

Your answer explains the problem with using compression. I am however still 
seeing 
another problem when enabling compression in the attached smaller program I 
get: 

Exception in thread "main" java.lang.IllegalArgumentException
    at java.nio.Buffer.limit(Buffer.java:267)
    at com.esotericsoftware.kryo.Compressor.readObjectData(Compressor.java:92)
    at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:474)
    at 
com.esotericsoftware.kryo.serialize.CollectionSerializer.readObjectData(Collecti
onSer
ializer.java:113)
    at com.esotericsoftware.kryo.Compressor.readObjectData(Compressor.java:102)
    at com.esotericsoftware.kryo.Serializer.readObject(Serializer.java:58)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:493)
    at Test1.main(Test1.java:48)

But when no compression it used it runs ok.

------------------

In general I think Kryo looks like a very promising component. The things that 
would 
need to be fixed to make it useful for me (with my current use-case) would be:

1. None recursive reference handling (the fact that Java Serialization results 
in 
stack overflow on deep structures was my original reason to look for an 
alternative)
2. Detection and handling of self referencing structures
3. The optimizations and missing features you mentioned in the reply.

Are you by the way interested in bringing more developers on-board to help out 
with 
improving Kryo? I will have a good lock at the code base and if I manage to 
understand it I may be able to help out...

/JavaFanboy

Original comment by javafan...@gmail.com on 12 May 2010 at 6:16

Attachments:

Test1.java

GoogleCodeExporter commented 9 years ago

Your exception occurs because DeflateCompressor uses a temporary buffer which 
is not
large enough. Please see the two argument DeflateCompressor constructor. The 
error
message is very poor. I will provide a better message.

The deflate algorithm can process in chunks, so does not require a large 
buffer. The
DeflateCompressor does not process in chunks (for easy implementation) and 
should be
rewritten to be more efficient. There is already a comment at the top of
DeflateCompressor noting this, but I haven't gotten around to it. :)

One issue you may not be aware of with your test case. Kryo supports compressing
objects that are not the root of the graph. You have an ArrayList of 
ArrayLists. You
register DeflateCompressor for all ArrayLists, which means that the bytes for 
each
ArrayList in the graph will be compressed individually, and then the ArrayList 
that
is the root object will cause all the bytes to get compressed. It would be more
efficient to only compress the root object. If you really have the need to 
compress
the root object and not objects of the same type in the middle of an object 
graph,
maybe the Compressor class needs a setting to support this.

Sorry to respond to each of your issues by saying, "yeah, Kryo needs to be 
improved
there"! Your ability to find where Kryo could be augmented is uncanny. ;)

---------

1. Kryo serializers currently use stack based recursion because it is easy and
efficient to implement. It is rarely a problem and generally increasing the 
stack
size with -Xss is sufficient. If this workaround is unacceptable, eg in a
multithreaded environment, Kryo serializers are pluggable. A new one could be 
written
that uses heap based recursion. If you take a crack at this, you may want to 
model it
after FieldSerializer.

2. You may be able to get by using ReferenceFieldSerializer until reference 
support
for all objects is implemented.

I am all for community contributions! Just post a patch and I will merge it 
into the
core if it is acceptable. I tend to work on Kryo in spurts. Unfortunately right 
now I
have many other pressing projects, so it may be a week or two before I can 
provide it
my full attention.

Original comment by n4ted...@yahoo.com on 12 May 2010 at 7:02

GoogleCodeExporter commented 9 years ago

Thanks again for the VERY quick answer Nathan - are you a night owl or like me 
located in Europe perhaps?

I will play around with my tests a bit more to see if I can make them work.

As for finding all the areas that needs improvement I suppose the explanation 
is that 
I have done a lot of "special purpose" serialization stuff over the years and 
knows 
what is hard/complicated to do so those are the things I am looking for 
solutions to 
in a serialization component that I evaluate - then it is no miracle that some 
of the 
things do not work (yet!) - and I only talk about the tests that DO NOT work :-)

If I find enough time and inspiration I will give it a go to understand how the 
Kryo 
code works and then I will get back to you to hint on what I will try to 
improve (so 
we can avoid duplicate efforts).

/JavaFanboy

Original comment by javafan...@gmail.com on 12 May 2010 at 7:17

GoogleCodeExporter commented 9 years ago

I'm a night owl. Possibly more of a vampire.

I hope you'll find Kryo's source pretty straightforward. The Serializer 
interface
defines some basic methods to read/write objects. Classes implementing this can 
be
used on their own to do serialization. The Kryo class acts as a repository, so 
the
various serializers can be used for graphs. There isn't much more to it.

One architectural choice that may be an issue, since you mentioned extremely 
deep
graphs, is the use of ByteBuffer rather than streams. There is currently a 
thread on
the discussion group about this.

Original comment by n4ted...@yahoo.com on 12 May 2010 at 7:22

GoogleCodeExporter commented 9 years ago

Well buffers may not necessary be the wrong way to go. Deep structures do not 
NEED to 
produce huge results (if the issues we have talked about with references are 
solved).

/JavaFanboy

Original comment by javafan...@gmail.com on 12 May 2010 at 7:28

GoogleCodeExporter commented 9 years ago

I made some additions / changes that allow collections to be serialized only 
once and
additional references to be saved as a reference. I followed the same line of
implementation as is already used for FieldSerializer/ReferenceFieldSerializer 
and
created a ReferenceCollectionSerializer. I refactored out the References class 
to a
separate file and added a new protected method (as in FieldSerializer) that is 
used
by ReferenceCollectionSerializer.

Feel free to make whatever changes you feel apropriate to the code to make it 
fit in
better!

/JavaFanBoy

Original comment by javafan...@gmail.com on 13 May 2010 at 1:01

Attachments:

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Original comment by nathan.s...@gmail.com on 10 Oct 2010 at 2:51

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

v2 has proper support for references.

Original comment by nathan.s...@gmail.com on 17 Apr 2012 at 10:21

Changed state: Fixed

duchien85 / kryo

Problem with deflate - wrong result? #19