Versioning and handling version conflicts

jgrewe commented 8 years ago

with the 1.1 release we extended the model by adding the Group but we still note the format to be 1.0. In this case there will be no problem. Old libs will ignore the Group, 1.1 libs will not miss them in 1.0 files...

Still, we need to come up with a schema how to handle version changes and decide if we want to keep backward compatibility within one major version.

Pr #591 will break compatibility at the moment...

gicmo commented 7 years ago

Couple of thoughts: We actually have two separate "zones" in NIX that should be versioned:

"data model"/API version (i.e. adding of Group should lead to a increase of that)
representation of the "data model" on disk through various backends, changes in here should also be reflected in a version increase (i.e. saving booleans differently in the hdf5 backend doesn't touch the data model, and also maybe not the filesystem backend)

The first conclusions from this that different backends should have some way to indicate a on disk format change independently from other backends.

It is probably safe to assume that data model changes will lead to on-disk format changes but not the other way around. It is probably a good idea to not make any on-disk format changes that are not backwards compatible and keep the "data model" number the same.

We currently have tree integers available to represent the version scheme, and the current version is 1.0.0. If we agree on the assumptions above we could use the following scheme:

X.Y.Z
    ^---- backwards compatible on-disk format number
  ^------ backwards compatible data-model API version number
^-------- *NOT* backwards compatible data-model (and accompanying on-disk format) changes

achilleas-k commented 7 years ago

So to understand this correctly, a change in the so called second zone (e.g., boolean representation change), would bump Y because it keeps the data-model backwards compatible, but not the on-disk format. Z is then reserved only for bug fixes, or method changes (e.g., more efficient searching, or object creation/deletion, etc) that affect neither the data-model nor the way data is stored on disk.

gicmo commented 7 years ago

@achilleas-k With backwards compatible I mean that newer libraries can read the old format and the new format, but not the other way around (i.e. old libraries will fail read new version). Therefore for the boolean I would bump Z in the HDF5 backend. For the Group changes I would bump Y (and here is the money question: maybe keep Z the same, and in the version checking code only look for Z changes). Same with DataFrames. If we at some point change the way Links and Tags work that are completely new and stuff, change X and reset Y and Z. Does that make sense? Should Z and Y be switched?

achilleas-k commented 7 years ago

Ok, I follow now. Not sure about the part where we would increase Y and keep Z as an indicator of file version. I think it will create a confusing compatibility tree. I can understand why we would want something like that and I honestly can't think of another solution right now. One thought would be to have two versions, an API version and an independent file version, but that seems messy too.

jgrewe commented 7 years ago

I agree with the scheme. Not sure about not resetting z if y has been increased, though. My gutt tells me to reset it... probably it does not really matter. It further tells me that I would prefer the part that is most likely to change should be the last. This would mean to swap y and z. Backward compatibility only for read access, right (maybe only for files opened in readonly mode)?

gicmo commented 7 years ago

After some in-person discussion the a variation of the initial theme could be used:

X.Y.Z
    ^---- non-breaking change
  ^------ breaking change
^-------- general iteration number

Here a breaking change means that old libraries (y (new) > y ) cannot read the file anymore, either because a) it technically wouldn't work or b) the represented data would be wrong. Vice versa, additions to the model (e.g. Group or DataFrame) would no break the implementation or representation and would increase z. The X value would be used to give a general model version and an increase of that would lead to a reset of both Y and Z. Since Y also means a breaking change, increasing Y means resetting Z. In the scheme the traditionally schematic would basically stay intact. Alternatively one could not use X, shift the meaning of the rest to the left and use the last two numbers (Y, Z) to indicate non-breaking changes that are either read-only compatible Y or read-write compatible (Z), i.e. an z > current z means can still write to the file, y > current y can read, but not write anymore.

gicmo commented 6 years ago

Closing this.

G-Node / nix

Versioning and handling version conflicts #596