Goal here is to describe and finish designing the mechanism to "rebase" changes.
When user A tries to commit a change, currently the commit will fail if user
B committed since A's session started. This is the best and safest
default, but it's not necessarily what A wants every time. For example
maybe A wrote to array /array_a and B wrote to /array_b and those
changes are unrelated. In a case like that, A may decide to still
do the commit, accepting the risks if they know exactly what B changes were.
A rebase is then, the process of "merging" a change, potentially modifying it,
on top of other pre-existing changes.
We want to provide:
A mechanism for users to execute a rebase after a failed commit.
Users can define what changes are OK to rebase and which are not, and how
their changes must be modified for a clean rebase. Example: if user wrote
to an array but a previous commit deleted that array, the user may
indicate to either fail their commit, or to simply rebase ignoring any
writes to the array.
If a rebase fails we need to explain why.
Transaction logs
As part of this change we will introduce the concept of TransactionLog.
These are files we will store on-disk, in their own prefix, and with the same
id as the corresponding snapshot. The transaction log contains a serialization,
somewhat expanded, of the ChangeSet.
They provide at least two utilities:
An easy way to know what the conflicting commits changed, to be able to
execute rebases without having to compare snapshots (it would be very
expensive).
In the future, an easy way to provide diff functionality.
Transaction logs will be generated from the ChangeSet (and probably a bit
of extra information, like the list of existing nodes), and they will be
written during the commit process.
Transaction logs can be made optional. For ultimate performance users may choose
not to use them, but in that case, they'll be giving up on rebase and diff
functionality.
Conflict resolution
In the most detailed case, conflict resolution could be done interactively.
Users may want to investigate their own change, together with the diffs of
the conflicting changes, and decide with full detail how to modify their
change for the rebase. This sounds like a very advanced usage, and we don't need
to support it initially. We just need to make sure it is possible in the future.
In the simpler case, the user will run rebase after a commit failed with
conflict. They will call a rebase function, passing a ConflictSolver
that includes the policy on how to deal with different types of conflicts.
Some conflict resolution examples
If two changes write to the same chunk, user can select ours or theirs
If writes happen to an entity deleted in a previous change, we may
support: ignore write or fail the rebase
TODO: more
Exhaustive list of conflicts and resolutions
This is WIP
When a previous change deleted an array:
if chunks were written to it: recoverable by not applying the change
if user attributes were set: recoverable by not applying the change
if metadata was changed: recoverable by not applying the change
When a previous change deleted a group:
if user attributes were set on it: recoverable by not applying the change
if a new array is created inside of it: recoverable by re creating the
implicit group
When a previous change creates an array
if a node is created on the same path: recoverable by not applying the change
if an implicit group is created on the same path: recoverable by not
applying the change
When a previous change creates a group
if a node is created on the same path, except if it's implicit
Design document
Goal
Goal here is to describe and finish designing the mechanism to "rebase" changes.
When user
A
tries to commit a change, currently the commit will fail if userB
committed sinceA
's session started. This is the best and safest default, but it's not necessarily whatA
wants every time. For example maybeA
wrote to array/array_a
andB
wrote to/array_b
and those changes are unrelated. In a case like that,A
may decide to still do the commit, accepting the risks if they know exactly whatB
changes were.A rebase is then, the process of "merging" a change, potentially modifying it, on top of other pre-existing changes.
We want to provide:
Transaction logs
As part of this change we will introduce the concept of
TransactionLog
. These are files we will store on-disk, in their own prefix, and with the same id as the corresponding snapshot. The transaction log contains a serialization, somewhat expanded, of theChangeSet
.They provide at least two utilities:
diff
functionality.Transaction logs will be generated from the
ChangeSet
(and probably a bit of extra information, like the list of existing nodes), and they will be written during the commit process.Transaction logs can be made optional. For ultimate performance users may choose not to use them, but in that case, they'll be giving up on rebase and diff functionality.
Conflict resolution
In the most detailed case, conflict resolution could be done interactively. Users may want to investigate their own change, together with the diffs of the conflicting changes, and decide with full detail how to modify their change for the rebase. This sounds like a very advanced usage, and we don't need to support it initially. We just need to make sure it is possible in the future.
In the simpler case, the user will run rebase after a commit failed with conflict. They will call a
rebase
function, passing aConflictSolver
that includes the policy on how to deal with different types of conflicts.Some conflict resolution examples
ours
ortheirs
Exhaustive list of conflicts and resolutions
This is WIP
When a previous change deleted an array:
When a previous change deleted a group:
When a previous change creates an array
When a previous change creates a group
When a previous change updates user attributes
When a previous change updates zarr metadata
When a previous change writes/deletes a chunk