EsotericSoftware / kryo

Java binary serialization and cloning: fast, efficient, automatic
BSD 3-Clause "New" or "Revised" License
6.19k stars 823 forks source link

Buffer underflow while deserializing with Collection type field removed (with reproducible test case) #834

Closed BrotherJing closed 3 years ago

BrotherJing commented 3 years ago

Describe the bug Got KryoException: Buffer underflow when I try to deserialize data after remove a field with Collection type from the class.

To Reproduce

First, serialize the original object.

public static class A {
    public List<String> aaa = new ArrayList<>();
    public String bbb = "bbb";
    public String ddd = bbb;
}

@Test
public void test_kryo5() {
    A a = new A();
    a.aaa.add("a");
    byte[] bytes = new KryoSerializeUtil().serialize(a);
    String s = Base64.getEncoder().encodeToString(bytes);
    System.out.println(s);
}

Then, comment field aaa, and run

@Test
public void test_kryo5Deserialize() {
    // the serialized data from above
    String s = "AQBjb20ucGRkLmJpZ2RhdGEucmlzay5wdWJsaXNoLmNvbnRyYWN0LnV0aWxzLkNoZWNrc3VtVXRpbFRlc3QkwQEDYWHhYmLiZGTkGgEBamF2YS51dGlsLkFycmF5TGlz9AECAYJhAAUDAWJi4gACAwUA";
    A a = deserialize(Base64.getDecoder().decode(s));
    System.out.println(a.ddd);
}

Got

00:00 DEBUG: [kryo] Unable to read unknown data, type: java.util.ArrayList (com.pdd.bigdata.risk.publish.contract.utils.ChecksumUtilTest$A#null)
com.esotericsoftware.kryo.kryo5.KryoException: Buffer underflow.
    at com.esotericsoftware.kryo.kryo5.io.Input.require(Input.java:219)
    at com.esotericsoftware.kryo.kryo5.io.Input.readVarIntFlag(Input.java:480)
    at com.esotericsoftware.kryo.kryo5.io.Input.readString(Input.java:775)

FYI, The kryo I use:

private static Kryo kryo = new Kryo();
static {
    kryo.setReferences(true);
    kryo.setRegistrationRequired(false);
    CompatibleFieldSerializer.CompatibleFieldSerializerConfig config =
            new CompatibleFieldSerializer.CompatibleFieldSerializerConfig();
    config.setChunkedEncoding(true);
    config.setReadUnknownFieldData(true);
    kryo.setDefaultSerializer(new SerializerFactory.CompatibleFieldSerializerFactory(config));
    kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());
    Log.set(LEVEL_DEBUG);
}

private <T> T deserialize(byte[] bytes) {
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
    Input input = new Input(byteArrayInputStream);
    input.close();
    return (T) kryo.readClassAndObject(input);
}

private byte[] serialize(Object obj) {
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    Output output = new Output(byteArrayOutputStream);
    kryo.writeClassAndObject(output, obj);
    output.close();
    return byteArrayOutputStream.toByteArray();
}

Environment:

Additional context Add any other context about the problem here.

theigl commented 3 years ago

@BrotherJing: Thank you for the detailed bug report!

I'm currently on vacation and won't be able to look into this for a couple of weeks. If you want, you can submit a PR with a fix and I'll merge it.

theigl commented 3 years ago

@BrotherJing: I was able to reproduce your problem and have debugged it for some time but was unable to come up with a solution. The problem seems to be related to references. If references are disabled, the data can be read without problems.

I currently don't have the capacity to spend a lot of time on this issue and would greatly appreciate it if someone could look into this and help with fixing the issue.

krishna81m commented 3 years ago

@theigl, we came across a similar issue, is there more documentation and examples on CompatibleFieldSerializer as it is supposed to be the simplest and most compatible.

https://github.com/EsotericSoftware/kryo#compatiblefieldserializer wiki also has to be updated to readUnknownFieldData default: true. From what I gathered, if we set readUnknownFieldData:true it handle unknown fields and if reading fails due to class removed? chunkedEncoding:true will gracefully skip.

Assuming, most people will use the simplest configuration for CompatibleFieldSerializer to support graceful deserialization with backward/forward compatibility, any idea why this is not a major bug?

CompatibleFieldSerializer.CompatibleFieldSerializerConfig config =
            new CompatibleFieldSerializer.CompatibleFieldSerializerConfig();
    config.setChunkedEncoding(true);
    config.setReadUnknownFieldData(true);
theigl commented 3 years ago

@theigl, we came across a similar issue, is there more documentation and examples on CompatibleFieldSerializer as it is supposed to be the simplest and most compatible.

Apart from the Readme, there is only the JavaDoc for the serializers.

https://github.com/EsotericSoftware/kryo#compatiblefieldserializer wiki also has to be updated to readUnknownFieldData default: true. From what I gathered, if we set readUnknownFieldData:true it handle unknown fields and if reading fails due to class removed? chunkedEncoding:true will gracefully skip.

I added the missing default values to the documentation. Your understanding is correct.

Assuming, most people will use the simplest configuration for CompatibleFieldSerializer to support graceful deserialization with backward/forward compatibility, any idea why this is not a major bug?

What do you mean by "this"? This ticket or that chunkedEncoding isn't enabled by default?

I'm not the original author of the serializers, but I am pretty sure that chunked encoding is disabled by default, because it has a significant performance overhead and users should opt-in if they really need it.

theigl commented 3 years ago

@BrotherJing: After some more hours of debugging, I managed to get to the bottom of this. Chunked input failed to correctly skip a chunk after a buffer underflow. The underflow is caused by another issue that I cannot fix at this point, but my proposed changes will solve your problem.

theigl commented 3 years ago

@BrotherJing: Please test your issue against the latest Kryo Snapshots.

krishna81m commented 3 years ago

@BrotherJing , any luck with the latest snapshot?

@theigl, as a follow up from previous discussion, why would this config throw exception when deserializing bytes with a new registered class added? Shouldn't CompatibleFieldSerializer, chunkedEncoding and warnUnregisteredClasses(true) work gracefully?

                Kryo kryo = new Kryo();
                kryo.setReferences(false);

                CompatibleFieldSerializer.CompatibleFieldSerializerConfig config =
                        new CompatibleFieldSerializer.CompatibleFieldSerializerConfig();
                config.setReadUnknownFieldData(true);
                config.setChunkedEncoding(true);
                kryo.setDefaultSerializer(new SerializerFactory.CompatibleFieldSerializerFactory(config));

                kryo.setRegistrationRequired(false);    // each class has its own ID
                // registered classes must have exact same IDs during deserialization or use "warn" for backward compatibility?
                kryo.setWarnUnregisteredClasses(true);

                kryo.register(Foo.class, 11);
                kryo.register(Bar.class, 12);