Schema versioning/migrations

TheMMaciek commented 4 years ago

Regarding versioning and Kryo: Currently for serializing we are using the default FieldSerializer. Using this serializer it's impossible to add new fields or remove old ones, it's possible to rename the field if after sorting by name the order doesn't change. In order to be able to evolve the schema we would need to switch to a different serializer. For Kryo there are three available: VersionFieldSerializer, TaggedFieldSerializer, CompatibleFieldSerializer. VersionFieldSerializer - it gives backward compatibility only, you can only add new fields along with @Since(Int) annotation with a version number when the field first appeared. Shouldn't be much heavier then FieldSerializer as it adds only one value to the serialized data. TaggedFieldSerializer - didn't test it as if I understood correctly it requires to mark every field with a @Tag(Int) id, which seems like a lot of work. Adds backward and optionall forward compatibility. CompatibleFieldSerializer - gives backward and forward compatibility. Just as Tagged serializer it produces bigger files as it persists data in chunks, which allows it to ignore unknown fields and not fail when new fields appear. Both Version and Compatible serializers put in a default value when loading old data with schema having new fields. Unfortunately we have no control over what default value would that be, setting default value like case class foo(bar: Int = 10) doesn't work. For int's the default is 0, for Double 0.0, for String and other objects it's null, even for Option[_] it's still null not None unfortunately. Handling the defaults would require to write custom serializers/deserializers.

I googled a bit how to version and handle default values with Kryo and I stumbled upon a conversation with guy who seemed to be one of the Kryo creators and he said that for such advanced usage and handling defaults one should consider using different solutions like e.g. Avro or Protobuf as Kryo wasn't created with such usage in mind.

Anyway if we want to make changes to our schema we for sure need to keep the current version on the side with the current serializer and registrar to be able to deserialize old data - needed for streaming-app and during first rollback with new version at least (none of the Kryo serializers other than current one can do that as they use different formats). Then we can create another let's say v2 and use a different Kryo serializer or different solution altogether for the new data. Currently (and I tested it) we can add new classes, and reorganize current classes to extend common trait - that doesn't break the current serializers as field serializer is only concerned with the class fields, not its name and parents/children.

This requires further investigation regarding other technologies used for serialization upon which a decision can be made if we want to switch to a new technology or stay with Kryo and just use more flexible serializer.

buckysballs commented 3 years ago

See https://app.zenhub.com/workspaces/protocol-5b749bea49591a10f180d950/issues/constellation-labs/constellation/1434

marcinwadon commented 3 years ago

We have investigated proto and avro as an alternative for Kryo and both libraries cannot be used because of security reasons. We decided to move this ticket back to backlog and re-address this issue after state channels.

Constellation-Labs / constellation

Schema versioning/migrations #1212