earth-mover / icechunk

Open-source, cloud-native transactional tensor storage engine
https://icechunk.io
Apache License 2.0
305 stars 18 forks source link

Support conflict resolution #297

Open paraseba opened 1 month ago

paraseba commented 1 month ago

When two sessions make writes concurrently to the tip of a branch, one of them is going to fail at commit time. This is a feature, we want serializability of commit history. One of those sessions needs to read what the other session wrote, and, potentially, use that data for its own writes. In essence, one of the sessions needs to use the other as the parent.

There are exceptions to the above logic. The easiest one is if sessions wrote to completely separate parts of the repository. For example session 1 wrote to group group1 and session 2 wrote to group2. In that case, potentially but not always, the user may want to "merge" both sessions taking the writes from both. In that case, the user may want something like a git fast-forward, where the session 2 "changes its parent" to be session 1, even if that's not really what happened during writes.

There are way more complex cases too, and also this is not necessarily the behavior ever user will want, for example, the writes to group2 may be based on reads from group1, and in that case, session 2 would have committed stale data if a fast-forward is done.

There is potential for many different conflict resolution strategies, we need analyze, design and prioritize them.