Closed cypherdare closed 7 years ago
I started an implementation of this alternate method of forward compatibility. Haven't thoroughly tested it yet.
One downside with this functionality is if unknown tags are encountered, it must assume a chunked object has been written and try to skip it. Therefore, no exception is thrown if an incompatible future class (one with new tags not marked @Late
) is read. Instead, a debug message can be shown that states that it is assuming a future tag and skipping it, as a possible hint of why some other exception might later occur during the read.
However, it's theoretically possible the entire object tree can be read without throwing an exception, resulting in a secretly corrupt object. This would be a problem if one is relying on catching KryoException to determine if future-based data is being read.
So this is either a breaking change, or forward compatibility must be off by default. It doesn't break existing persisted data, but does break apps that rely on KryoException for rejecting data. At least in that case, a config option could be added to disable forward compatibility.
What you write about the current implementation issues sounds reasonable. I haven't reproduced it by myself, can we start with a test case that shows the issue?
Regarding the solution, could we also add an attribute to @Tag
?
I'll put together a more thorough test. (TaggedFieldSerializerTest needs a testRemovedField
method like CompatibleFieldSerializerTest has.)
What is the entry point of the tests in the Kryo project? They all derive from KryoTestCase, which has a protected setUp
method that isn't called from anywhere.
Adding an attribute to Tag would be more proper, but also result in more verbose/less readable code:
@Late @Tag(3) public int myField;
vs.
@Tag(value=3, late=true) public int myField;
but maybe making an @LateTag
that takes a value and is used in place of @Tag
would be the best solution:
@LateTag(3) public int myField;
More concise than either of the above, and only one annotation needed at a time. Thoughts?
Tests are run via mvn test, no enforcement to inherit KryoTestCase, it must just be a junit test (check some other tests not extending KryoTestCase).
LateTag looks more concise, but still introduces a whole new annotation. Semantically the important information is that a Tag is set, "late" is additional information. Semantically I'd prefer to have Tag only, but see verbosity as a slight disadvantage as well. Perhaps other have thoughts on this as well?
I'm not completely happy with "late" as well, having the feeling that there might be an even better word for that, unfortunately I couldn't find one until now :-)
cypherdare notifications@github.com schrieb am So., 17. Juli 2016, 05:49:
I'll put together a more thorough test. (TaggedFieldSerializerTest needs a testRemovedField method like CompatibleFieldSerializerTest has.)
What is the entry point of the tests in the Kryo project? They all derive from KryoTestCase, which has a protected setUp method that isn't called from anywhere.
Adding an attribute to Tag would be more proper, but also result in more verbose code:
@Late @Tag(3) public static final int myField; vs. @Tag(value=3, late=true) public static final int myField;
but maybe making an @LateTag that takes a value and replaces @Tag would be the best solution:
@LateTag(3) public static final int myField;
More concise than either of the above, and only one tag needed at a time. Thoughts?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/EsotericSoftware/kryo/issues/442#issuecomment-233163596, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD7HW2v556b0JN6AJnt7fSbDDGXyDvCks5qWaYugaJpZM4JIgc- .
Here's another idea...instead of a new annotation or attribute, use negative tag values to signify forward compatible fields. This would be the most concise solution of all. But the obvious downsides are that it is a breaking change (unless it's off by default) and that it relies on the user reading the documentation to use it correctly.
Since Tag is a runtime annotation, avoiding an extra attribute or annotation also reduces runtime bloat, though this might be insignificant in most cases.
I'll flip through the dictionary some more and see if I can find another short word that is more descriptive than Late in case the consensus is to keep a second attribute or annotation. On Sun, Jul 17, 2016 at 2:05 AM Martin Grotzke notifications@github.com wrote:
Tests are run via mvn test, no enforcement to inherit KryoTestCase, it must just be a junit test (check some other tests not extending KryoTestCase).
LateTag looks more concise, but still introduces a whole new annotation. Semantically the important information is that a Tag is set, "late" is additional information. Semantically I'd prefer to have Tag only, but see verbosity as a slight disadvantage as well. Perhaps other have thoughts on this as well?
I'm not completely happy with "late" as well, having the feeling that there might be an even better word for that, unfortunately I couldn't find one until now :-)
cypherdare notifications@github.com schrieb am So., 17. Juli 2016, 05:49:
I'll put together a more thorough test. (TaggedFieldSerializerTest needs a testRemovedField method like CompatibleFieldSerializerTest has.)
What is the entry point of the tests in the Kryo project? They all derive from KryoTestCase, which has a protected setUp method that isn't called from anywhere.
Adding an attribute to Tag would be more proper, but also result in more verbose code:
@Late @Tag(3) public static final int myField; vs. @Tag(value=3, late=true) public static final int myField;
but maybe making an @LateTag that takes a value and replaces @Tag would be the best solution:
@LateTag(3) public static final int myField;
More concise than either of the above, and only one tag needed at a time. Thoughts?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub < https://github.com/EsotericSoftware/kryo/issues/442#issuecomment-233163596 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAD7HW2v556b0JN6AJnt7fSbDDGXyDvCks5qWaYugaJpZM4JIgc-
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EsotericSoftware/kryo/issues/442#issuecomment-233167049, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwfuw4jlnFYWO5xlru6Qen6jrwPrhp7ks5qWcY1gaJpZM4JIgc- .
@magro Could you point me in the right direction for importing the kryo project? I'm new to working with Maven and have been struggling with correctly importing the project for a couple hours. I opened the project by opening pom.xml in IntelliJ, and it can't resolve the pom-main.xml and pom-shaded.xml modules even though the files are right there. Therefore, the project can't be maked, and mvn test
fails.
So I tried importing pom.xml in Eclipse, and it can't resolve anything from commons-lang. Even though commons-lang is in the pom.xml dependencies, it doesn't appear in Referenced Libraries in the project tree (all the other dependencies do appear).
I'm used to both IDEs, but prefer IntelliJ.
Sorry for the delay, I was on vacation with bad connectivity (and right now I'm still only on my phone).
IIRC I opened the project in intellij by referring to the external maven model (pom.xml). Alternatively you could try mvn idea:idea
(not sure if this is still possible).
What's the error when mvn test fails btw?
Perhaps https://www.jetbrains.com/help/idea/2016.2/getting-started-with-maven.html#import_maven_project can help (you might have to delete intellij files to start from scratch).
Sorry, took me a while to come up with free time to look at this again.
The warning that leads to the error:
Could not transfer metadata org.apache.bcel:bcel:6.0-SNAPSHOT/maven-metadata.xml from/to codehaus-snapshots (http://nexus.codehaus.org/snapshots/): nexus.codehaus.org
Some kind of issue with it downloading this dependency I suppose.
Ah, right, this is submitted as #450, this PR should show the workaround.
OK, will get back to this hopefully in the next few days. Thanks!
@magro The next few days turned into five months, sorry! I have added a test that illustrates the problem here: https://github.com/cypherdare/kryo.git
I also started a branch off of that with my chunked encoding solution:
https://github.com/cypherdare/kryo.git. It uses an optional annexed
parameter on the Tag annotation, that causes chunked encoding to be used if set to true. It's still necessary to enable ignoreUnknownTags
to support forward compatibility, which in this case is the ability to skip unexpected data by assuming chunked encoding.
Clirr was giving me trouble with adding a method to the interface, so I added an exception. I think it might be a Clirr bug/oversight to treat new methods of an annotation as a breaking change, which I don't think they are if a default value is provided.
I used the word annexed
instead of late
, which perhaps describes better what it is without being too long. Maybe it could be chunked
but I feel like that is more like describing the solution than the field you're tagging.
TaggedFieldSerializer.isIgnoreUnkownTags()
is spelled wrong. I think this is a good time to fix it. Does this require an exception for this specific method in the pom.xml file, or is there some other way to satisfy Clirr?
Actually, in my opinion TaggedFieldSerializerConfig should be removed completely, and the parameter should simply be a field of TaggedFieldSerializer, since it is the only configuration option. It's convoluted and misleading to have both FieldSerializerConfig and TaggedFieldSerializerConfig instances in the Kyro instance. And we might as well break it since the feature is currently broken anyway--it will be good for anyone using it to see code breakage and draw attention to the bug.
@cypherdare Great that you continued here!
I've looked at https://github.com/cypherdare/kryo/commit/18b0c8de6c7314b55c7d430dbb374416f136ed29 and only left a single comment (nothing important), looks good to me! Maybe we should add a test with nested objects where both got changed (just for completeness, as it should work already)?
Clirr was giving me trouble with adding a method to the interface, so I added an exception. I think it might be a Clirr bug/oversight to treat new methods of an annotation as a breaking change, which I don't think they are if a default value is provided.
According to https://docs.oracle.com/javase/specs/jls/se7/html/jls-13.html#jls-13.5.7 I'd say that clirr is correct and the change breaks binary compatibility, but I've not verified that (run code compiled against old kryo with new kryo coming with the evolved annotation). Source compatibility however would be provided (i.e. existing code could be compiled against the new version without any change). Binary compatibility would be nice, but then we'd have to introduce a new annotation for annexed fields (I like annexed btw, good choice!). WDYT?
TaggedFieldSerializer.isIgnoreUnkownTags() is spelled wrong.
My preference would be to deprecate this and add the correct version.
Actually, in my opinion TaggedFieldSerializerConfig should be removed completely
It's there to allow global configuration of all TaggedFieldSerializer instances, e.g. if TaggedFieldSerializer is set as default serializer. Therfore we should keep this IMO.
According to https://docs.oracle.com/javase/specs/jls/se7/html/jls-13.html#jls-13.5.7 I'd say that clirr is correct and the change breaks binary compatibility
Probably I was wrong here, misinterpreting the spec (see also http://stackoverflow.com/questions/21197255/is-adding-a-method-to-a-java-annotation-safe-for-backward-compatibility, although there are minor differences). To be completely safe we should probably verify this on our own.
I'm pretty sure it breaks binary compatibility. Or at least instances of a class utilizing this tag may use more bytes when specifying the value of the new parameter. But I'm not completely knowledgeable about use of this library...Under what conditions would binary compatibility be beneficial?
Regarding the misspelled method, my thinking is that 100% of people who have used this method did so with the expectation that their application is forward compatible. And it is not, so it might be good to intentionally break source compatibility so it draws their attention and they carefully consider their plans for dealing with it.
The reason I brought up eliminating TaggedFieldSerializerConfig is that FieldSerializerConfig handles the configuration for all FieldSerializer types except TaggedFieldSerializer. This is not clearly documented or easy to document. It's unexpected behavior that you can change a setting in FieldSerializerConfig and it affects VersionFieldSerializer but not TaggedFieldSerializer.
I'll put together a more rigorous tes before starting a PR. On Sat, Apr 22, 2017 at 11:20 AM Martin Grotzke notifications@github.com wrote:
According to https://docs.oracle.com/javase/specs/jls/se7/html/jls-13.html#jls-13.5.7 I'd say that clirr is correct and the change breaks binary compatibility
Probably I was wrong here, misinterpreting the spec (see also http://stackoverflow.com/questions/21197255/is-adding-a-method-to-a-java-annotation-safe-for-backward-compatibility, although there are minor differences). To be completely safe we should probably verify this on our own.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EsotericSoftware/kryo/issues/442#issuecomment-296380292, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwfu7qmoXzPeHc5YLSRDKflRWOUaur-ks5ryhrHgaJpZM4JIgc- .
Under what conditions would binary compatibility be beneficial?
When the dependency graph gets more complex and several libs at once depend on kryo. E.g. in the hadoop / spark etc ecosystems it becomes extremely hard for a certain lib to update kryo if binary compatibility is not guaranteed, because other libs might be broken then, therefore coordination between projects would be required. Even if an incompatibility affects only rare cases, the incompatibility means that things might break and hinders bigger projects to upgrade kryo. We had several such issues in the past, and it would be great if we could have less incompatible changes in the future.
Regarding the misspelled method, my thinking is that 100% of people who have used this method did so with the expectation that their application is forward compatible. And it is not, so it might be good to intentionally break source compatibility so it draws their attention and they carefully consider their plans for dealing with it.
Apart from the historical context of kryo I agree with you. But, as said above, labeling the next kryo version as binary incompatible would block some major projects from the upgrade. I'd think that less users use the TaggedFieldSerializer with the expectation of provided forward compatibility than those that use it transitively via spark etc. Therefore I'd prefer not to label the next kryo version as incompatible and instead have a hint/warning in the release notes, in reference to this issue here.
Re TaggedFieldSerializerConfig: the TaggedFieldSerializer required an additional setting, others did not. We had introduced the FieldSerializerConfig because there were global settings on kryo where the relationship to the affected serializers was not clear. In consequence, we added the TaggedFieldSerializerConfig as a specialization of FieldSerializerConfig, because it added a special setting for TaggedFieldSerializer (learning from the past). As you say we can improve the documentation to make the relationships easier to understand.
I hope I could clarify things a bit. I really value your engagement btw, thanks a lot!
I haven't contributed to a library that's used within other libraries before, so that thorough explanation helps a lot, thanks!
Does the desire to avoid breaking binary compatibility mean we should use a new tag after all, despite it being sub-optimal from a semantics view?
I also had to learn this when I started to help maintaining kryo :-)
Does the desire to avoid breaking binary compatibility mean we should use a new tag after all, despite it being sub-optimal from a semantics view?
I just tested this: adding boolean annexed() default false;
to @Tag
is both binary and source compatible, i.e. I could run a test case that was compiled against kryo 4.0.0 with a kryo version that includes the change, and compiling against the new version of course was successful as well.
So it's fine as it is - great! :-)
Thanks for checking. I'll add a more rigorous test and report back or start a PR.
Great!
@magro I updated https://github.com/cypherdare/kryo/tree/tfsForwardChunked with a more complete test, and fixed up the docs some.
I had one more idea to run by you before making a PR. ignoreUnknownTags
no longer has the exact same meaning as before. The tags are not ignored (which causes the errors I originally described) but rather their data is skipped. Someone who hasn't been following closely might see the new docs and start using the annexed
setting, incorrectly assuming that their old released application that used ignoreUnknownTags
is forward compatible with this new data.
So what if we deprecated ignoreUnknownTags
and add a new setting skipUnknownTags
? This would force users to notice and adapt to the change appropriately. Not sure how this affects binary compatibility.
Sorry for the delay, sounds good!
PR #365 was supposed to add forward compatibility to TaggedFieldSerializer, but I think the logic is flawed. It probably works if you are deserializing a single tagged class (and maybe that's how it got through testing), but since the bytes of unrecognized tagged fields are ignored, those bytes are not skipped in the input stream, so all subsequent objects in the stream will be corrupt.
Here's a test class. I wrote a small list of objects. Then I commented out a field to simulate an old version of the app trying to read future data. Only the first object in the list is not corrupt.
I think forward compatibility could be achieved through the use of a new annotation. Put an
@Late
before an@Tag
for any fields added to new releases of an application. These fields are then written with chunked encoding so the bytes can be skipped in older releases of an application. Still an advantage over CompatibleFieldSerializer because the chunked encoding is only needed if the class evolves to use new tagged fields--otherwise buffers don't need to be allocated for chunking. And if you don't care about forward compatibility, just don't use@Late
.