FST should check field Class when deserialize Object

tsl0922 commented 8 years ago

This is the field of class Project:

    private Integer owner_id;
    private Timestamp created_at;
    private Timestamp updated_at;
    private Short status;
    private Short recommended;
    private Integer type;
    private Integer max_member;
    private String name;
    private String description;
    private String icon;
    private Short plan;
    private Integer id;

This is the deserialized Object in debugger after fields changed(I'm running 2 different version of my application):

No exception is thrown when deserialize, but exception will be thrown when the result is used. I think FST should check if the deserialized field value matches the origin declaration and throw exception if it doesn't match.

I'm using fst 2.28, this issue also exists in the latest version(2.43).

RuedigerMoeller commented 8 years ago

This can happen if you try to deserialize different versions of a class. Checking each field would slow down too much. I think it could be achieved by turning of Unsafe (type check is at a lower level then) but still this will yield a ~20% slowdown. Inspecting the code, the following should be done:

(re-)introduce a global switch to turn off use of unsafe. Serialization is prepared to work without unsafe however currently fallback code will always try to get the unsafe singleton in case it is set to null.

Thanks for the report :)

tsl0922 commented 8 years ago

@RuedigerMoeller Does FST respect serialVersionUID? If we changed a class and it's serialVersionUID, will deserialization be success?

RuedigerMoeller commented 8 years ago

no, fst is absolutely trimmed for max performance. you might add a custom serializer for classes that require a version check

Am Dienstag, 24. November 2015 schrieb Shuanglei Tao :

@RuedigerMoeller https://github.com/RuedigerMoeller Does FST respect serialVersionUID? If we changed a class and it's serialVersionUID, will deserialization be success?

— Reply to this email directly or view it on GitHub https://github.com/RuedigerMoeller/fast-serialization/issues/103#issuecomment-159197999 .

quaff commented 8 years ago

I have same problem if new version add fields with same serialVersionUID, I hope fst provide an option to respect serialVersionUID for safe deserialize with new version class, It's very important for rpc, service provider upgraded and service consumer will fail if not upgraded which is normal in real world.

thefallentree commented 8 years ago

@quaff you should try protobuffer instead.

quaff commented 8 years ago

@thefallentree protobuf and thrift and avro need schema to generate NON-POJO code, It is not intuitional, I like the way similar with java.io.Serializable.

tsl0922 commented 8 years ago

@RuedigerMoeller It seems Unsafe can be disabled for serialization by FSTUtil.unFlaggedUnsafe = null without changing the source code of fst, is it right?

RuedigerMoeller commented 8 years ago

It should be like that, however there is the lazy init in "getUnsafe", so only direct references to "unflaggedUnsafe" are blocked, so double check this. Serialization has a fallback implementations if unsafe is not available (10-15%) penalty, some of the offheap stuff not.

RuedigerMoeller commented 8 years ago

General remark:

Versioning is always an issue which needs to be handled explicitely, there is no magical solution. "serialversionId" is a bad and slow approach as it puts a penalty for each object written (even tiny ones), in addition it enlarges message size (+ ~8 bytes per object ref).
serialversionId just throws an exception in case it does not match, so what have you gained ?.
even half baked handling of versioning requires extra tags inside the binary object representations, approaches like "extension blocks" destroy linear memory write access (=cache misses).
more advanced handling of versioning requires additional metadata such as with SBE or protobuf message files. Both libraries have significant limitations in message structure, often leading to an additional transfer step in order to create application-level objects from message objects. This overhead usually is not included in benchmarks.

You need to plan ahead regarding version issues, there are several aproaches which do not impose a performance/message size penalty:

for RPC, at connect time put in a version checking at application level.
for persistence: write a version id first before writing your objects.

strategies for backward compatibility: 1) Do not modify classes but instead subclass from version 1. E.g. MyDataV1 extends MyData. 2) I have also used package renaming to deal with different versions (my.package.v1, mypackage.v2). 3) put in a "Object extraData", which can be used as a container for additions later on (simple cases)

There is no such thing like automatical versioning, you have to deal with it explicitely at application level most of the time.

I'll still try to figure out a scheme to solve the issue without hurting use cases of fst handling versioning e.g. at connect time.

tsl0922 commented 8 years ago

Thanks for you reply, that is very useful.

However, deserialize different versions of a class is very common in the real world. If fst can not ensure data correctness for this situation, you may add a note on the readme for new users, because it is different from java serialization, fst does not throw an exception for data error caused by version at deserialize time(it is hard to debug), but java serialization does.

RuedigerMoeller commented 8 years ago

its already in the readme afaik (limits or so). Its true this is not same behaviour as java serialization, but again I won't pay a significant performance penalty to support a feature only some need or that can be implemented without runtime cost by applying some foresight ;).

thefallentree commented 8 years ago

To be frank, I think a serialization library that doesn't talk much on versioning is not a good way of doing things. It feels that this project is ignoring how real application develops in the wild.

We were attracted by fst promise of fairyland and bitten by subtle versioning problems later. It's not a very promising sign of the project and we simply don't know what other issues may lay ahead to justify.

That being said, I totally agree with you that support versioning/field changes while being fast is hard. We may end up still looking at using protobuf later, sometime hard choice is making future life easier.

Cheers.

On Wed, Nov 25, 2015 at 8:58 PM, moru0011 notifications@github.com wrote:

its already in the readme afaik (limits or so). Its true this is not same behaviour as java serialization, but again I won't pay a significant performance penalty to support a feature only some need or that can be implemented without runtime cost by applying some foresight ;).

— Reply to this email directly or view it on GitHub https://github.com/RuedigerMoeller/fast-serialization/issues/103#issuecomment-159600262 .

RuedigerMoeller commented 8 years ago

Well, I'll put a 'limits' section in the readme. BTW: Has any of you guys read the doc about "@version" - annotation ? It supports extension of existing classes within some limits (which are hard to understand I admit).

One of the issues of stock jdk serialization is, it provides a lot of possibly unneeded features resulting in very poor performance. FST does the contrary: it implements core serialization very fast, ommiting costly features. With any tradeoff, there is a price to pay. However adding features works, removing features not :)

Adding app level version awareness is trivial, in case you face this issue after having rolled out your stuff: that's plain bad engineering. Having a space and performance impact for each object written just to enable a simple exception throing version check is a bad decision imo. That's not a way to build first class high performance RPC systems. Just use stock jdk serialization then.

Protobuf: Ofc you can do that, its just much more expensive regarding development cost and its way slower if you count in the fact you need to translate protobuf message structures to real application objects. Protobuf does not even support interlinked object graphs properly. Its a good solution for cross-language rpc, however its an expensive choice to build a distributed system in java in an elegant way. I'd choose SBE over protobof btw.

tsl0922 commented 8 years ago

I agree, performance is important, but data correctness should be considered before performance, because we can't use unreliable data. I will try to add an app level version check for this, but I hope fst can handle this itself in the future.

RuedigerMoeller commented 8 years ago

i will add an "@checkversion" classlevel annotation + global option to always do this.

tsl0922 commented 8 years ago

:+1:

Another request: can you make fst compatible with serialVersionUID ? I'd like to see it be implemented.

RuedigerMoeller commented 8 years ago

I'll use serialversionId if declared in the class, else I'll use a checksum computed from the type's and names of instance fields. This way objects will become incompatible only in case instance fields change (jdk ser is more picky in this case afaik). BTW: don't confuse me with a service provider, despite >6000 maven.org downloads from > 2500 distinct IP's (=pom.xml from projects all over the world) per month, I don't make a dime out of this.

In general I prioritize fst features I need for my commercial work, and I always handle versioning at app level as having thrown an exception rarely is sufficient to deal with the issue. So pull requests are welcome ^^.

thefallentree commented 8 years ago

@RuedigerMoeller Thanks!

RuedigerMoeller / fast-serialization

FST should check field Class when deserialize Object #103