addthis / stream-lib

Stream summarizer and cardinality estimator.
Apache License 2.0
2.26k stars 559 forks source link

how to serialize HLL using kryo #124

Closed mycFelix closed 7 years ago

mycFelix commented 7 years ago

Hi, we are using HLL in Storm project. What we are expecting is to serialize HLL object using Kryo and sent it to the other Bolt component for computing. But it was failed when trying to serialize HLL object by following code for test:

        Kryo kryo = new Kryo();
        kryo.setInstantiatorStrategy(new SerializingInstantiatorStrategy());
        kryo.register(HyperLogLogPlus.class);
        try {
            Output output = new Output(new FileOutputStream("/Users/Felix/Desktop/file.bin"));
            kryo.writeObject(output, card);
            output.close();

            Input input = new Input(new FileInputStream("/Users/Felix/Desktop/file.bin"));
            HyperLogLogPlus someObject = kryo.readObject(input, HyperLogLogPlus.class);
            System.out.println(someObject.cardinality());
            input.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

We got following errors:

Exception in thread "main" com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): com.clearspring.analytics.stream.cardinality.HyperLogLog
    at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
    at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
    at hyperloglog.TT.main(TT.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Does it mean that HLL can not be serialized by Kryo? Or, has any way to do that correctly?

Thank you very much.

aubdiy commented 7 years ago

I have the same problem.(v_v)

abramsm commented 7 years ago

First, I would use HyperLogLogPlus over HyperLogLog if possible.

Both HyperLogLog and HyperLogLogPlus implement Serializable. Looking at Kyro's readme you can do something like:

kryo.setInstantiatorStrategy(new SerializingInstantiatorStrategy());

Is that an option for you? The concern I have about adding a no-arg constructor is that after the HLL is instantiated you can't change the value of p and sp post instantiation with the current implementation.

Let me know if the alternate approach will work for you. If not I'm open to ideas on how we can reasonably support it.

matt

On Wed, Jan 4, 2017 at 6:35 PM, AUB notifications@github.com wrote:

I have the same problem。 (v_v)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/addthis/stream-lib/issues/124#issuecomment-270539875, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfa0N2cUuakjlvP7AO3W4_G9CnHPdPks5rPEjogaJpZM4LaYLw .

mycFelix commented 7 years ago

@abramsm - Thank you so much for taking your time and your constructive suggestions. I will recommend my team to use HyperLogLogPlus as you mentioned. I noticed both HyperLogLog and HyperLogLogPlus implement Serializable so we decided to send an Object directly in Tuple in our Storm project. As we know, Kryo is much faster than Java Serializable, which is where my issue comes from. After set SerializingInstantiatorStrategy(), I got following new errors:

Exception in thread "main" com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Can not set final com.clearspring.analytics.stream.cardinality.RegisterSet field com.clearspring.analytics.stream.cardinality.HyperLogLog.registerSet to java.lang.Integer
Serialization trace:
registerSet (com.clearspring.analytics.stream.cardinality.HyperLogLog)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
    at hyperloglog.TT.main(TT.java:59)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.IllegalArgumentException: Can not set final com.clearspring.analytics.stream.cardinality.RegisterSet field com.clearspring.analytics.stream.cardinality.HyperLogLog.registerSet to java.lang.Integer
    at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:164)
    at sun.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:168)
    at sun.reflect.UnsafeQualifiedObjectFieldAccessorImpl.set(UnsafeQualifiedObjectFieldAccessorImpl.java:83)
    at java.lang.reflect.Field.set(Field.java:741)
    at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:619)
    ... 8 more

I'll check Kyro's readme again and see what I can do to find an alternate approach as well.

mycFelix commented 7 years ago

@abramsm - Hi, more informations. The good news is I upgraded the Kryo's version from 2.2.1 to the latest release(4.0.0) and errors above were gone. Unfortunately, I got following new error:

Exception in thread "main" com.esotericsoftware.kryo.KryoException: org.objenesis.ObjenesisException: java.io.NotSerializableException: class com.clearspring.analytics.stream.cardinality.RegisterSet not serializable
Serialization trace:
registerSet (com.clearspring.analytics.stream.cardinality.HyperLogLogPlus)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:540)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:709)
    at hyperloglog.TT.main(TT.java:57)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: org.objenesis.ObjenesisException: java.io.NotSerializableException: class com.clearspring.analytics.stream.cardinality.RegisterSet not serializable
    at org.objenesis.strategy.SerializingInstantiatorStrategy.newInstantiatorOf(SerializingInstantiatorStrategy.java:54)
    at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1127)
    at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1136)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:559)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:535)
    at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
    at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
    ... 8 more
Caused by: java.io.NotSerializableException: class com.clearspring.analytics.stream.cardinality.RegisterSet not serializable
    ... 15 more
abramsm commented 7 years ago

Ah, looks like serializable isn't implemented all the way through.

I agree with your earlier comment re Java Serializable being pretty slow as compared to Kryo. However, Java Serializable is not the serialization approach recommended for HLL and HLL+. They both have an inner Builder class that is designed to encode/decode the objects very quickly. Can you look into Kryo to see if you can configure it to use that mechanism? I think that will be your best bet for speed and correctness.

Matt

On Wed, Jan 4, 2017 at 8:25 PM, mycFelix notifications@github.com wrote:

@abramsm https://github.com/abramsm - Hi, more informations. The good news is I upgraded the Kryo's version from 2.2.1 to the latest release(4.0.0) and errors above were gone. Unfortunately, I got following new error:

Exception in thread "main" com.esotericsoftware.kryo.KryoException: org.objenesis.ObjenesisException: java.io.NotSerializableException: class com.clearspring.analytics.stream.cardinality.RegisterSet not serializable Serialization trace: registerSet (com.clearspring.analytics.stream.cardinality.HyperLogLogPlus) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:540) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:709) at hyperloglog.TT.main(TT.java:57) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) Caused by: org.objenesis.ObjenesisException: java.io.NotSerializableException: class com.clearspring.analytics.stream.cardinality.RegisterSet not serializable at org.objenesis.strategy.SerializingInstantiatorStrategy.newInstantiatorOf(SerializingInstantiatorStrategy.java:54) at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1127) at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1136) at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:559) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:535) at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ... 8 more Caused by: java.io.NotSerializableException: class com.clearspring.analytics.stream.cardinality.RegisterSet not serializable ... 15 more

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/addthis/stream-lib/issues/124#issuecomment-270554730, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfawn2zNegAgMKh2YbdIvpqyzfaJ0Gks5rPGKagaJpZM4LaYLw .

mycFelix commented 7 years ago

@abramsm - I believe we've found an alternate approach for our case, which is sending HLL bytes in tuples not the instance directly. I'll close the issue and thank you so much for your patient help.

xiaoyue26 commented 5 years ago

@abramsm - I believe we've found an alternate approach for our case, which is sending HLL bytes in tuples not the instance directly. I'll close the issue and thank you so much for your patient help.

Hello, we're facing exactly the same problem as yours, could u share the solution? many thanks!